Your website might have excellent content, but if search engines can’t find it, none of that matters. An XML sitemap acts as a roadmap that guides crawlers directly to your most important pages, helping them understand what to index and when to revisit.
Getting the structure right isn’t complicated, yet many websites still get it wrong. This guide breaks down exactly how XML sitemaps work, the correct format to use, and the best practices that actually make a difference for crawling and indexing.
Quick answer.
- XML sitemaps list URLs you want search engines to crawl and index
- The only required tag is <loc> (the page URL), while <lastmod> is strongly recommended
- Google ignores <priority> and <changefreq> tags entirely
- Each sitemap is limited to 50,000 URLs or 50MB uncompressed
- Always use absolute URLs with the correct protocol (https vs http)
What is an XML sitemap?
An XML sitemap is a specially formatted file that lists all the pages on your website you want search engines to discover. Written in Extensible Markup Language (XML), it provides crawlers with a structured directory of your content, along with metadata about when pages were last updated.
Think of it as a table of contents for your website that only search engines read. While visitors navigate through menus and links, bots can use your sitemap to quickly identify which pages exist and which ones deserve attention.
XML sitemaps are particularly valuable for websites that have deep site architecture where pages are many clicks away from the homepage, new websites with few external backlinks pointing to them, large sites with thousands of pages that change frequently, and sites with rich media content like videos or images that need separate indexing.
According to Google’s official documentation, most websites benefit from having a sitemap, especially larger or more complex sites where important pages might otherwise be missed during standard crawling.
The correct XML sitemap format.
Every valid XML sitemap follows a specific structure defined by the sitemaps protocol. Here’s what a basic sitemap looks like:
<?xml version=“1.0“ encoding=“UTF-8“?>
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9“>
<url>
<loc>https://www.example.com/page-one/</loc>
<lastmod>2025-01-15</lastmod>
</url>
<url>
<loc>https://www.example.com/page-two/</loc>
<lastmod>2025-01-10</lastmod>
</url>
</urlset>
The file must be UTF-8 encoded, and all data values need to be entity-escaped (special characters converted to their XML equivalents). The <urlset> tag wraps all your URLs and declares the namespace, while each <url> element contains information about a single page.
When building your sitemap, understanding the difference between absolute vs relative URLs is essential. Always use absolute URLs that include the full path with protocol.
XML sitemap tags explained.
The sitemaps protocol defines four tags you can use within each <url> element. Only one is required, but understanding all of them helps you make informed decisions about your implementation.
The loc tag (required).
The <loc> tag specifies the URL of the page and is the only mandatory element. It must contain the full, absolute URL including the protocol.
<loc>https://www.example.com/products/widget/</loc>
Your URLs should always match your canonical tags. If your site uses HTTPS (which it should), every URL in the sitemap must use HTTPS. Consistency matters here because mismatches between sitemap URLs and canonical URLs can confuse search engines about which version to index.
The lastmod tag (recommended).
The <lastmod> tag indicates when the content was last meaningfully updated. Google actively uses this signal when deciding which pages to recrawl.
<lastmod>2025-01-15</lastmod>
Bing has publicly stated that <lastmod> is a key freshness signal for both traditional search and AI-driven queries. The date format follows W3C datetime standards, either as YYYY-MM-DD or the full timestamp YYYY-MM-DDTHH:MM:SS+00:00.
One critical point: the date must reflect when the page content actually changed, not when the sitemap was generated. Setting all pages to the same date defeats the purpose entirely.
The changefreq tag (largely ignored).
The <changefreq> tag suggests how often the page content is likely to change.
<changefreq>weekly</changefreq>
Valid values include always, hourly, daily, weekly, monthly, yearly, and never. Google has confirmed they ignore this tag entirely, though other search engines may still reference it.
The priority tag (largely ignored).
The <priority> tag indicates the relative importance of a page compared to other pages on your site, using values from 0.0 to 1.0.
<priority>0.8</priority>
Like <changefreq>, Google ignores this tag. Most SEO professionals recommend keeping your sitemaps simple by only including <loc> and accurate <lastmod> values.
Sitemap size limits and index files.
A single XML sitemap cannot exceed 50,000 URLs or 50MB when uncompressed. For smaller websites, this is rarely an issue. Larger sites need to split their content across multiple sitemaps and use a sitemap index file to organise them.
A sitemap index looks like this:
<?xml version=“1.0“ encoding=“UTF-8“?>
<sitemapindex xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9“>
<sitemap>
<loc>https://www.example.com/sitemap-products.xml</loc>
<lastmod>2025-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-blog.xml</loc>
<lastmod>2025-01-14</lastmod>
</sitemap>
</sitemapindex>
For enterprise SEO strategies, segmenting sitemaps by content type (products, blog posts, category pages) makes monitoring and troubleshooting significantly easier.
The referenced sitemaps must be hosted on the same site as your sitemap index file, and they must be in the same directory or lower in your site hierarchy.
Types of XML sitemaps.
Beyond standard page sitemaps, specialised sitemap types help search engines discover and understand different content formats.
Image sitemaps.
Image sitemaps help Google discover images that might be loaded via JavaScript or otherwise missed during standard crawling. You can create a separate image sitemap or add image tags to your existing sitemap using the image namespace.
<url>
<loc>https://www.example.com/product-page/</loc>
<image:image>
<image:loc>https://www.example.com/images/product.jpg</image:loc>
</image:image>
</url>
This approach is particularly valuable for ecommerce sites where product images need visibility in Google Image Search.
Video sitemaps.
Video sitemaps provide additional metadata about video content, including titles, descriptions, thumbnails, and duration. This information enables rich results in search.
<url>
<loc>https://www.example.com/video-page/</loc>
<video:video>
<video:thumbnail_loc>https://www.example.com/thumb.jpg</video:thumbnail_loc>
<video:title>Product Demo</video:title>
<video:description>A quick overview of our product features.</video:description>
</video:video>
</url>
News sitemaps.
News sitemaps are specifically designed for articles published within the last 48 hours. They function as a breaking news feed for Google News and should only contain recent content.
<url>
<loc>https://www.example.com/news/article/</loc>
<news:news>
<news:publication>
<news:name>Example News</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2025-01-15T10:00:00+00:00</news:publication_date>
<news:title>Breaking News Headline</news:title>
</news:news>
</url>
Hreflang sitemaps.
For multilingual or multi-regional websites, hreflang annotations can be included in your sitemap to indicate language and regional variations of pages.
<url>
<loc>https://www.example.com/</loc>
<xhtml:link rel=”alternate” hreflang=”en-au” href=”https://www.example.com/” />
<xhtml:link rel=”alternate” hreflang=”en-gb” href=”https://www.example.co.uk/” />
</url>
This complements on-page hreflang tags and helps ensure search engines serve the correct version to users in different locations.
XML sitemap best practices.
Following these guidelines ensures your sitemap actually helps search engines rather than creating confusion.
Only include indexable URLs.
Every URL in your sitemap should return a 200 status code, be accessible to crawlers, and have no conflicting directives. Pages with noindex tags, those blocked by robots.txt, or those that redirect should never appear in your sitemap.
If you’re experiencing Google indexing issues, checking for conflicts between your sitemap and other directives is often a good starting point.
Match sitemap URLs to canonical tags.
The URLs in your sitemap must exactly match your canonical URLs. This includes protocol (http vs https), www vs non-www, and trailing slashes. A mismatch signals inconsistency that can delay or prevent indexing.
Common URL structure problems like inconsistent trailing slashes often manifest as sitemap errors in Search Console.
If you’re looking to implement these technical SEO improvements but don’t have the time or resources to handle them in-house, our expert technical SEO services can help you get everything configured correctly while you focus on running your business.
Keep your sitemap updated.
Your sitemap should reflect the current state of your website. When you publish new content, update existing pages, or remove old ones, your sitemap needs to change accordingly.
Dynamic sitemaps that automatically update when content changes are far more effective than static files that require manual maintenance. Most content management systems and SEO plugins handle this automatically.
Submit through Google Search Console.
While search engines can discover your sitemap through robots.txt, submitting it directly through Google Search Console provides valuable feedback about errors and indexing status.
If your website is not being indexed, Search Console’s sitemap reports often reveal the underlying issues.
Reference your sitemap in robots.txt.
Adding a sitemap directive to your robots.txt file ensures search engines can find it:
Sitemap: https://www.example.com/sitemap.xml
This simple addition makes discovery reliable and is considered a best practice even when you’ve also submitted through Search Console.
Use compression for large sitemaps.
Compressing sitemaps with gzip reduces bandwidth usage and speeds up processing. The compressed file uses a .xml.gz extension, and Google handles decompression automatically.
Common XML sitemap mistakes to avoid.
Even experienced webmasters make these errors. Avoiding them prevents wasted crawl budget and indexing delays.
Including non-200 status codes.
Pages returning 404, 301, or 500 errors waste crawler resources. Every URL should load successfully. Regular auditing catches these issues before they accumulate.
Mixing HTTP and HTTPS URLs.
If your site uses HTTPS (and it should), every URL in your sitemap must use HTTPS. Mixed protocols create canonicalisation confusion and can prevent proper indexing.
Setting identical lastmod dates.
When every page shows the same modification date, the signal becomes meaningless. Only update <lastmod> when content genuinely changes, and ensure your CMS tracks this accurately.
Exceeding size limits.
Sitemaps over 50,000 URLs or 50MB are invalid and may be partially ignored or rejected entirely. Monitor your sitemap size and split into multiple files before hitting limits.
Including duplicate URLs.
Parameter variations, trailing slash inconsistencies, and other URL variants inflate your sitemap without benefit. Include only canonical versions.
Forgetting to update after migrations.
Site migrations often break sitemaps. Old URLs, incorrect protocols, and outdated structures persist unless specifically addressed. Post-migration audits should always include sitemap verification.
Our complete technical SEO guide covers migration checklists and other common technical issues in detail.
How to create an XML sitemap.
Most websites don’t need to build sitemaps manually. Modern CMS platforms and SEO plugins generate them automatically.
WordPress sites typically use plugins like Yoast SEO or Rank Math that create and maintain sitemaps without manual intervention. Shopify, Wix, and Squarespace include built-in sitemap generation.
For custom-built sites, sitemap generation can be handled through your database, pulling URLs programmatically and outputting valid XML. Our guide on creating XML sitemaps covers the technical implementation in more detail.
For very small sites (under a few dozen pages), manual creation in a text editor remains viable. Simply follow the format outlined earlier and save as sitemap.xml in your root directory.
How to validate your XML sitemap.
Before submitting, validate your sitemap to catch formatting errors and structural problems.
Google Search Console provides the most authoritative validation, highlighting specific errors and warning about URLs that can’t be crawled. Third-party validators like the XML Sitemap Validator at xmlsitemapvalidator.com offer quick syntax checks without requiring site ownership verification.
Common validation errors include missing or incorrect namespace declarations, improperly formatted dates, URLs containing invalid characters, exceeding size limits, and malformed XML syntax.
Understanding technical SEO ranking factors helps contextualise why proper sitemap implementation matters for overall search performance.
Monitoring sitemap performance.
Creating a valid sitemap is just the beginning. Ongoing monitoring ensures it continues serving its purpose.
In Google Search Console, the Sitemaps report shows submission status, discovered URLs, and any errors encountered during processing. The Index Coverage report reveals whether URLs from your sitemap are actually being indexed, and if not, why.
Pay attention to the gap between URLs submitted and URLs indexed. A large difference suggests issues with content quality, duplicate content, or crawlability problems beyond the sitemap itself.
Regular audits (monthly for active sites) catch problems before they impact rankings significantly. Automated monitoring tools can alert you to sudden changes in indexed page counts or new errors appearing.



