However, for a long time there has been no official definition of what a crawl budget actually is. Google has now changed this and has more closely explained the concept in one of its own articles from the Webmaster Central Office on 1/16/2017.
To ensure that search engines such as Google can deliver current as well as relevant results to users’ queries, the web must be permanently crawled by bots. This crawling allows your website to be indexed in search results lists. Nevertheless, Googlebot cannot permanently crawl all websites; its activity is limited (the so-called “crawl budget”). In the article by Gary Illyes dated 1/16/2017, the crawl budget of Googlebot is defined as follows:
This definition is supplemented by further explanations in the article.
The following lessons can be drawn from it:
1. The crawl budget is important for the indexing of your website
The larger the budget that is available for Googlebot, the more sites it can crawl in your domain, and the more contents can thereafter be indexed and appear in the SERPs.
2. Domains with less than 1,000 URLs are simpler
According to the statement by Gary Illyes, the crawl budget fundamentally has no influence on sites newly published on the internet, because they are already crawled by Googlebot. Moreover, the crawl budget does not play an important role for domains with fewer than 1,000 URLs, as Googlebot can efficiently crawl this number. Additional factors play a role, such as the server capacity of the website and the prioritization of URLs to be crawled.
3. The faster the site, the better the crawl rate
Every webmaster who engages with SEO knows that fast websites have a positive impact on usability. Fast server responses nevertheless also have advantages in crawling. The faster a website responds, the higher the crawling frequency of Googlebot and the more simultaneous connections it can use for crawling. If you would like to optimize your use of the crawl budget for Googlebot, you should pay attention to fast servers and fast-loading websites.
The crawl rate can also be controlled by the Google Search Console. This allows the webmaster to reduce frequency in order to conserve server capacity. Nevertheless, an increase will not automatically lead to a higher frequency when crawling.
4. Crawling is not a factor for ranking
A higher crawling frequency does not necessarily lead to better positions in the search results. (Gary Illyes, Google)
It is fundamental to understand that the crawling of a website alone is not relevant for ranking. However, the chance for good rankings increases the more extensively and frequently your website is crawled.
Thus, the algorithms of the search engine can always synchronize themselves according to how well your site matches a search query.
5. The need for crawling depends on various factors
Googlebot ensures that the crawl budget is dependent upon much you need to crawl your site. Thus, it does not have to exhaust your entire budget all at once.
The bot more frequently crawls websites that, according to Illyes, are “more popular”. Popularity on the internet is usually indicated by the number of incoming backlinks. Thus, a website that is more strongly linked is also more frequently crawled. A site can also be popular if it contains very current information and is continually updated, such as a news site. Unfortunately, Illyes does not touch on “popularity” any further. However, it is clear that Googlebot sees a need for crawling older, indexed sites. But here too the statement remains unspecific. A domain transfer is nevertheless a clear signal for Googlebot to crawl the site.
6. All URLs on a site are taken into consideration for the crawl budget
Googlebot follows all URLs on your site, so all URLs are considered for the crawl budget. It does not matter whether it is dealing with embedded URLs, alternative URLs for hreflang, or AMP. If you want to conserve the crawl budget, you should remove superfluous URLs from your site.
7. There are factors that negatively impact the crawl budget
Google must think economically, as there are many URLs that must be crawled on a daily basis. Also, a corporation such as Google does not want to expend unnecessary financial resources on crawling websites.
In his article, Gary Illyes defines exactly which factors can minimize the crawl budget:
- Faceted navigation: this can, for example, be a filter that generates a new URL with each upgrade
- Duplicate Content
- Soft-404-error pages
- Sites that were hacked
- Qualitatively substandard sites
- Spam sites
- Infinitely expandable sites: these could, for example, be calendars that generate a new URL each day
You can save on the Googlebot crawl budget
The simplest way to optimize the crawl budget is to reduce duplicate content. OnPage.org can help with identifying the appropriate pages and removing them if possible.
Figure 1: Identify duplicate contents with OnPage.org.
Examine your robots.txt file and ensure that Googlebot can crawl all relevant areas.
Figure 2: The robots.txt monitoring from OnPage.org
Update your XML sitemap regularly. By doing this, you show Googlebot all the important URLs on your website that it can follow.
✓ Simplify crawling for Googlebot and increase crawling requirements with fresh content.
✓ Examine your site regularly for server errors using the search console or OnPage.org.
✓ Also avoid sites with little added value, duplicate content, or spam.
Already, in only a few steps, you have saved on the crawl budget and have also optimized your website. Googlebot will visit your site anyway, but it is in your hands whether you truly exhaust its potential!