What is an XML sitemap?
XML sitemaps are a list of files that give hints to the search engines about how to crawl your website. Sitemaps help search engines find and classify content on your site that they may not have found on their own.
Sitemaps come in a variety of formats and can highlight many types of content, including video, images, news, and mobile.
An XML sitemap is different than an HTML sitemap, which exists to help users navigate a website. While search engines crawl sites link-by-link, an HTML sitemap is not a replacement for an XML sitemap.
XML Sitemaps can be submitted through Google Search Console and Bing Webmaster Tools. They should also be listed in the robots.txt file. Including files in a sitemap doesn't guarantee that those files will be indexed by search engines.
Maintaining clean sitemaps
Keeping a clean sitemap reduces wasted crawl budget on pages you don't want to be served in search results.
Google won’t necessarily disregard your sitemap if there’s more than 1% "dirt" (i.e. files that can't be indexed, like redirected URLs, broken URLs, or non-canonical URLs).
However, it still wastes crawl budget to have these pages submitted and crawled by Googlebot. On the other hand, Bing is up in the air about whether it will “trust your sitemap less” if more than 1% of dirt is found.
The index ratio (ratio of pages submitted versus pages indexed) is a great indication of how efficiently your site is being crawled. You should aim for a 1-1 index ratio in Google Search Console.
If you’ve submitted sitemaps and noticed the ratio is quite low—or certain pages aren’t being indexed—look for the following:
- Non-200 pages in your sitemap;
- Non-canonicalized pages in your sitemap;
- Improper use of nofollow or noindex tags.