How to Improve Your Site's Crawl Efficiency
Crawling and indexing are just the beginning of getting your pages to appear in search results. Although these are two basics of optimizing your site for search, there are many considerations that can rank your site on the first day versus having to "crawl" your way back to the top. The background and tips in this guide may assist you in adding a single new page to your site or launching a new site entirely.
This comprehensive site crawl guide will cover:
Googlebot Crawling
Since this guide addresses tactical questions related to crawl efficiency, we won’t cover the definitions of these activities. If you'd like more background before reading on, Google provides a brief introduction to how Google discovers and indexes sites on their Search Central blog.
First, we'll cover the basic characteristics of crawling before reviewing methods to optimize crawl efficiency. Although this article refers to Googlebot, it can be used as a general guide for all search engines.
Crawling is Googlebot's main priority. Making your site easy to crawl ensures the discovery of high-quality content, optimized metadata, linking strategy, and all other SEO pursuits implemented on your site.
Characteristics of Googlebot Crawling:
- Site-wide events like a site move or relaunch will trigger increased crawl demand
- If you have a small site (fewer than several thousand URLs), your site will be crawled efficiently most of the time
- Crawling is not a direct ranking factor, but if pages on your site cannot be crawled, they will not rank
- The crawl-delay directive in your robots.txt file does nothing
- Google doesn’t index everything it crawls and doesn’t show everything it indexes
- Google works to serve its users, so it will more regularly crawl and index popular pages on well-known sites to keep their index up-to-date
What is Crawl Efficiency?
Crawl efficiency is how seamlessly bots are able to crawl all the pages on your site. A clean site structure, reliable servers, errorless sitemaps and robots.txt files, and optimized site speed all improve crawl efficiency. If your site checks off all these boxes, you will maximize your site’s crawl budget, rate, efficiency, crawlability, and indexation.
To optimize crawl prioritization and efficiency on your site, know the different facets that make up crawl efficiency. Crawl efficiency depends heavily on the following:
Crawl Budget
Every site has a crawl budget. The crawl budget is the number of pages Googlebot wants to crawl (crawl demand) and the number of pages Googlebot can crawl (crawl rate). Once Googlebot "spends" its crawl budget, it stops crawling a site, even if it hasn’t reached every page you intend to get indexed. This is where meta tags come into play: They can keep Googlebot from wasting the crawl budget on pages that should not be served up in search results.
As stated before, if your site is small, your crawl efficiency is already higher than that of a multi-million-page site because there simply are fewer pages to crawl. Any URL crawled by Googlebot—regardless of whether it’s an alternate URL (e.g. AMP, hreflang pages), a URL with parameters, or embedded content (e.g. CSS, JavaScript)—counts toward your site’s crawl budget, which is why a clean sitemap and robots.txt file are pertinent.
Crawl Rate Limit
Every site also has a crawl rate limit. This limit is the maximum fetching rate for your site and represents the maximum number of simultaneous parallel connections that can be used to crawl the site, along with the necessary waiting time between these fetches. The purpose of the crawl rate limit is to avoid inundating your server with requests, which could slow down your site for human users. A reliable server and quick page loading can increase your crawl rate limit, helping your site to get crawled more often by Googlebot.
Factors That Negatively Affect Website Crawlability
The following factors can negatively affect your site's crawlability:
- Slow server responses or a significant number of 5XX errors
- A significant number of low-value-add pages. These can include:
- Many versions of a page with URL parameters that offer useless filtering, faceted navigation, or session identifiers
- Duplicate content within your site
- Low-quality pages
- Spammy pages
- Long redirect chains
- Long page-load times that may timeout
- Nonstrategic use of noindex and nofollow tags
- Pages served up through AJAX without links in the page source
- Blocking bots from crawling JavaScript and CSS files
- “Dirt” in your sitemap
Three Tips to Improve Your Site's Crawlability
To optimize crawl efficiency, do the opposite of the factors that negatively affect crawling and indexing. Here's how to discover areas to improve.
Reliable Servers For Improved Crawl Health
Just as server errors or slow server responses can reduce your crawl rate limit, a quick-responding server can improve your crawl rate limit. An increased crawl rate limit allows more connections to crawl your site.
💡 TIP: You can increase the “Crawl Rate Limit” in the Google Search Console. However, although reducing it lessens the crawling of your site, increasing it doesn’t necessarily increase crawling.
Site Speed to Increase Crawl Rate Limit
Tying into the previous factor, a faster site is a sign of reliable servers and increases the crawl rate. In addition, having a faster site means fewer timeouts—and fewer pages wasting your site’s crawl budget.
Sitemap and Robots.txt Files to Optimize Crawl Budget
Keeping a clean sitemap and updated directives in your robots.txt file reduces crawl budget waste on pages not intended for search results. Search parameter URLs count toward your crawl budget if you don't specify otherwise, which can keep bots from crawling more important pages on your site.
Although Google won’t necessarily disregard your sitemap if there’s more than 1% dirt (i.e. URLs that don't return a 200 response code), it still wastes the crawl budget to submit these pages for crawling. On the other hand, Bing is still up in the air about whether it will “trust your sitemap less” if it contains more than 1% of dirt. Even though Google is the preeminent search engine, we can’t leave 33% of search traffic on the table with such a simple mistake.
The index ratio (ratio of pages submitted to pages indexed) is a great indication of how efficiently your site is being crawled. Aim for a 1:1 index ratio in Google Search Console. If you’ve submitted sitemaps and noticed a low ratio, look for the following:
- Non-200 pages in your sitemap
- Non-canonical pages in your sitemap
- Spider traps not blocked in your robots.txt file
- Improper use of nofollow or noindex tags
💡 TIP: Breaking up a single sitemap into multiple sitemaps can help identify areas of your site that aren't getting indexed.
Optimize Your Crawl Budget
Understanding these elements and their roles in your site’s crawlability and indexation—although elementary—are essential for successful search optimization.
Not sure where to start? We can help! Check out our Technical SEO services.
This blog post was originally published on January 25, 2017, and was updated and republished on April 11, 2024.