Sitemaps + robots.txt: How to Tell Google Every Page of Your Portfolio
Key Takeaways
- →
sitemap.xmlis a map of every page on your site that you hand to Google - →
robots.txttells crawlers which paths are allowed and where to find your sitemap - →Next.js App Router lets you generate both files dynamically with
sitemap.tsandrobots.ts— no manual XML editing - →Always submit your sitemap manually in Google Search Console — don't wait for Google to find it
- →The Coverage and Crawl Stats reports show you exactly what Googlebot did with each URL
This post is Part 4 of the Can I Make Google Happy? series — the final installment.
Introduction
We've covered metadata, social previews, and structured data. But all of that is useless if Google can't find your pages in the first place.
This blog covers the two files that control how Googlebot crawls your site:
- 1.
sitemap.xml— a map of all your pages that you hand to Google - 2.
robots.txt— a rulebook that tells crawlers what they can and cannot access
Most portfolios get this wrong — either using a static hardcoded file (which gets stale), or skipping it entirely. I'll show you the Next.js App Router way that generates both files dynamically, and how to monitor crawl health in Google Search Console's Coverage and Crawl Stats reports.
Part 1: Sitemap
What Is a Sitemap?
A sitemap is an XML file that lists every URL on your site, along with hints about each page:
- →When it was last updated (
lastModified) - →How often it changes (
changeFrequency) - →How important it is relative to other pages (
priority)
Google doesn't *require* a sitemap — it will crawl your site without one. But a sitemap makes crawling faster and more reliable, especially for sites that don't have many inbound links (like a new portfolio).
Static sitemap.xml vs. Dynamic sitemap.ts
The old way (static XML file in `public/`):
<!-- public/sitemap.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://yourname.dev</loc>
<lastmod>2026-01-01</lastmod>
</url>
</urlset>
This works — but you have to manually update the lastmod date every time you change a page. If you forget, Google gets stale information.
The Next.js way (dynamic `src/app/sitemap.ts`):
import { MetadataRoute } from "next";
export default function sitemap(): MetadataRoute.Sitemap {
const base = "https://yourname.dev";
const now = new Date();
return [
{
url: base,
lastModified: now,
changeFrequency: "monthly",
priority: 1,
},
{
url: `${base}/projects`,
lastModified: now,
changeFrequency: "monthly",
priority: 0.8,
},
{
url: `${base}/experience`,
lastModified: now,
changeFrequency: "monthly",
priority: 0.8,
},
{
url: `${base}/education`,
lastModified: now,
changeFrequency: "yearly",
priority: 0.6,
},
];
}
Next.js automatically serves this at https://yourname.dev/sitemap.xml. Every build regenerates it with the current date — no manual updates needed.
Breaking Down Sitemap Fields
#### url
The full absolute URL of the page. Never use relative paths here.
#### lastModified: now
new Date() gives the current timestamp at build time. Every deployment automatically updates this. Google uses it to decide whether to re-crawl the page.
#### changeFrequency
A hint to Googlebot about how often this page changes:
| Value | Use Case |
|---|---|
"always" | Pages that change on every load (live data) |
"hourly" | News/live feeds |
"daily" | Active blogs |
"weekly" | Portfolio projects (active updates) |
"monthly" | Portfolio homepage, experience page |
"yearly" | Education, static about page |
"never" | Archived, never changes |
Important: Google treats this as a hint, not a rule. It uses its own signals (page change rate, importance) to decide actual crawl frequency.
#### priority
A value from 0.0 to 1.0. Default is 0.5.
- →
1.0→ Homepage (most important) - →
0.8→ Key pages (projects, experience) - →
0.6→ Secondary pages (education) - →
0.4or lower → Very low priority pages
Again, this is a hint. Google doesn't blindly follow it — but it helps when you have many pages.
Part 2: robots.txt
What Is robots.txt?
robots.txt is a plain text file at the root of your domain. Every crawler that visits your site reads this file FIRST before doing anything else.
- →Which pages they are allowed to visit
- →Which pages they should skip
- →Where to find your sitemap
It is NOT a security mechanism — it's a courtesy protocol. A malicious bot will ignore it. It's for legitimate crawlers like Googlebot, Bingbot, and others.
The Problem With a Static robots.txt
A static file at public/robots.txt looks like this:
User-agent: *
Allow: /
Sitemap: https://yourname.dev/sitemap.xmlThis works. But it has one problem: it's a hardcoded file that lives separately from your Next.js routing system. If you change your domain or add disallow rules, you have to remember to update this file manually.
The Right Way: Dynamic src/app/robots.ts
Next.js App Router supports a robots.ts file that generates robots.txt dynamically — the same pattern as sitemap.ts:
import { MetadataRoute } from "next";
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: "*",
allow: "/",
},
sitemap: "https://yourname.dev/sitemap.xml",
};
}
Next.js serves this at https://yourname.dev/robots.txt. Delete the old public/robots.txt after creating this file — you don't want two competing robots files.
Advanced robots.ts: Blocking Specific Paths
If your portfolio has any admin routes, API routes, or private pages, you can block them:
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: "*",
allow: "/",
disallow: ["/api/", "/admin/", "/_next/"],
},
],
sitemap: "https://yourname.dev/sitemap.xml",
};
}- →
userAgent: "*"→ applies to all crawlers - →
allow: "/"→ allow everything by default - →
disallow: ["/api/"]→ block the/api/path and everything under it
For a simple portfolio with no private routes, the basic version (allow everything) is correct. Don't over-engineer it.
What User-agent: * Means
User-agent: *
The * wildcard applies this rule to ALL crawlers — Googlebot, Bingbot, DuckDuckBot, and every other well-behaved crawler.
You can also write rules for specific crawlers:
rules: [
{
userAgent: "Googlebot",
allow: "/",
},
{
userAgent: "AhrefsBot", // Block SEO spy tools
disallow: "/",
},
],
For a portfolio, sticking with userAgent: "*" is the right call.
The Relationship Between sitemap.ts and robots.ts
These two files work as a team:
robots.txt
└── Sitemap: https://yourname.dev/sitemap.xml ← points to sitemap
sitemap.xml
├── https://yourname.dev (priority: 1.0)
├── https://yourname.dev/projects (priority: 0.8)
├── https://yourname.dev/experience (priority: 0.8)
└── https://yourname.dev/education (priority: 0.6)Flow:
- 1.Googlebot arrives at your site
- 2.It reads
robots.txt— knows it's allowed to crawl everything - 3.It follows the
Sitemap:link inrobots.txt - 4.It reads
sitemap.xml— gets the full list of URLs and their priorities - 5.It crawls each URL in order of priority
This is why both files live in src/app/ — they're part of the same routing system and both get generated fresh on every deployment.
Google Search Console: Submitting Your Sitemap
Don't wait for Google to find your sitemap on its own. Submit it manually:
Step 1: Open Google Search Console
Step 2: In the left sidebar → Sitemaps
Step 3: In the "Add a new sitemap" field, enter sitemap.xml (GSC prepends your domain automatically)
Step 4: Click Submit
Step 5: After a few minutes, refresh. The panel reports three things — Status: Success (Google read your sitemap), Discovered URLs (how many URLs Google found), and Last read (when Google last fetched it).
Google Search Console: Coverage Report
After submitting the sitemap, the Coverage report tells you what happened to each URL.
Step 1: GSC → Pages (previously called "Coverage")
Step 2: The report groups URLs into four categories:
| Status | Meaning |
|---|---|
| Error | Pages Google tried to index but couldn't |
| Valid with warnings | Indexed but has potential issues |
| Valid | Successfully indexed — these appear in search results |
| Excluded | Not indexed, but Google says it's intentional |
Common "Excluded" reasons for portfolios:
- →
"Crawled – currently not indexed"→ Google crawled it but chose not to index it yet. Wait a few days. - →
"Discovered – currently not indexed"→ Google knows it exists but hasn't crawled it yet. Check yourpriorityvalues. - →
"Duplicate, Google chose different canonical"→ You have a canonical conflict. Check youralternates.canonicalinlayout.tsx.
Google Search Console: Crawl Stats Report
The Crawl Stats report shows you exactly how Googlebot is behaving on your site:
Step 1: GSC → Settings → Crawl Stats
What to look for:
- →Total crawl requests — How often Google visits your site. For a new portfolio, this starts low (5–20 requests/day) and increases as you get indexed.
- →Average response time — Should be under 500ms. Higher means slow server, which can reduce crawl budget.
- →Crawl requests by response — All should be
200 OK. If you see many404responses, you have broken pages in your sitemap. - →File types crawled — Confirms Google is reading your
robots.txtandsitemap.xml.
Full Checklist: Did You Tell Google Everything?
- →
src/app/sitemap.tscreated — generates/sitemap.xmldynamically - →All portfolio pages included with correct
priorityandchangeFrequency - →
lastModified: new Date()— auto-updates on every deployment - →
public/robots.txtdeleted — no duplicate robots files - →
src/app/robots.tscreated — generates/robots.txtdynamically - →
robots.tsreferences the correct sitemap URL - →Verified
https://yourdomain.com/sitemap.xmlloads correctly in browser - →Verified
https://yourdomain.com/robots.txtloads correctly in browser - →Sitemap submitted in Google Search Console
- →Coverage report shows pages as "Valid"
- →Crawl Stats show healthy response times and no 404 errors
Wrapping Up the Series
Over these four blogs, here's what we built for the portfolio's SEO foundation:
| Blog | What We Did | GSC Report to Check |
|---|---|---|
| 01 | Metadata, canonical, robots config, font loading | URL Inspection |
| 02 | Open Graph image, Twitter Card | URL Inspection → HTML tab |
| 03 | JSON-LD Person schema | Rich Results report |
| 04 | sitemap.ts, robots.ts, crawl setup | Sitemaps, Coverage, Crawl Stats |
Each piece reinforces the others. Your metadata tells Google what you are. Your OG image tells social platforms how to represent you. Your schema tells Google's Knowledge Graph who you are. And your sitemap + robots.txt tells Googlebot exactly where to look and when to come back.
That's how you make Google happy.
Resources
- →Next.js sitemap.ts — official sitemap file convention reference
- →Next.js robots.ts — official robots file convention reference
- →Sitemaps.org Protocol — full sitemap specification
- →Google Search Console — submit your sitemap and monitor coverage
FAQ
Do I need both sitemap.ts and robots.ts?
You can have one without the other, but they're stronger together. robots.ts points crawlers to your sitemap, and sitemap.ts lists every URL you want indexed. Without robots.ts, Google has to guess your sitemap location.
What happens if I have both public/robots.txt and src/app/robots.ts?
The public/ file wins because it's served directly as a static asset. Delete it after creating robots.ts — otherwise the dynamic file is ignored.
How long until my pages appear in Google search?
After submitting your sitemap, expect 3–14 days for initial indexing. Some pages may stay in "Discovered – currently not indexed" for weeks. Building inbound links and consistently updating content speeds this up significantly.