The crawl budget indicates how fast and how many pages a search engine wants to crawl on your website. It depends on the amount of resources a crawler wants to use on your site and the amount of crawling your server supports.
More crawling doesn’t mean you rank better. However, if your pages are not crawled and indexed, they will not be rated at all.
Most websites don’t have to worry about the crawl budget, but there are a few times when you might want to take a look. Let’s look at some of these cases.
When should you worry about the crawl budget?
You usually don’t have to worry about the crawl budget on popular sites. Usually, it’s pages that are newer, don’t link well, or don’t change much that aren’t crawled often.
Crawl budget can be an issue for newer sites, especially sites with many pages. Your server may be able to support more crawling, but since your site is new and probably not very popular yet, a search engine may not want to crawl your site very often. This is mostly a break in expectations. You want your pages to be crawled and indexed, but Google doesn’t know if your pages are worth indexing and may not want to crawl as many pages as you want.
Crawl budget can also be an issue for larger websites with millions of pages or websites that are updated frequently. In general, if many pages are not being crawled or updated as often as you’d like, you should try to speed up the crawl. We’ll talk about how to do this later in this article.
How to check crawling activity
If you want an overview of Google’s crawling activity and the issues they have identified, the crawl statistics report in Google Search Console is the best place to look.
Here you will find various reports that can help you identify changes in crawling behavior, problems with crawling and more information about how Google is crawling your website.
You definitely want to examine all flagged crawl states like the one shown here:
There are also timestamps of when pages were last crawled.
If you want to see hits from all bots and users, you need access to your log files. Depending on your hosting and setup, you may have access to tools like Awstats and Webalizer, as seen here on a shared host with cPanel. These tools show some aggregated data from your log files.
For more complex setups, you will need to access and save data from the raw log files, possibly from multiple sources. You may also need special tools for larger projects, such as MOOSE (elasticsearch, logstash, kibana) Stack that enables the storage, processing and visualization of log files. There are also log analysis tools like Splunk.
What does the crawl budget count?
These URLs can be obtained by crawling and parsing pages or from a variety of other sources including sitemaps, RSS Feeds, submitting URLs to be indexed in Google Search Console, or using indexing API.
There are also several Googlebots that share the crawl budget. For a list of the various Googlebots crawling your website, see the Crawl Statistics Report in GSC.
Google adjusts the crawling
Each website has a different crawl budget made up of a few different inputs.
The crawl demand is simply how much Google wants to crawl on your website. More popular pages and pages with significant changes are crawled more often.
Popular pages or those with more links generally take precedence over other pages. Remember, Google needs to prioritize your pages in some way for crawling, and links are an easy way to determine which pages on your website are more popular. However, it’s not just your website, but all of the pages on all of the websites on the internet that Google needs to figure out how to prioritize.
You can use the … The best way to do this is through links Report in Site Explorer indicating which pages are likely to be crawled more often. It also shows you when Ahrefs last crawled your pages.
There is also a concept of staleness. When Google finds that a page is not changing, the page is crawled less often. For example, if they’re crawling a page and don’t see any changes after a day, they might wait three days before crawling again, ten days the next time, 30 days, 100 days, and so on. There isn’t actually a set amount of time between Waiting for crawls. but it gets rarer over time. However, when Google detects large changes across the website or a website move, the crawl rate is usually increased, at least temporarily.
Creep rate limit
The maximum crawl rate indicates how much crawling your website can support. Websites can do a certain amount of crawls before problems with the stability of the server, such as: B. Slowdowns or errors. Most crawlers will roll back crawling when they see these issues so they don’t harm the site.
Google adapts to the crawling status of the website. If the site is fine with more crawling, the limit will be increased. When the website has problems, Google slows the crawl rate.
I want google to crawl faster
There are a few things you can do to make sure your website supports additional crawling and increases the crawling needs of your website. Let’s look at some of these options.
Speed up your server / increase resources
Essentially, the way Google crawls pages is to download resources and then process them at the end of them. Your page speed as a user perceives that it is not quite the same. What affects the crawl budget is how quickly Google can connect and download resources, which has more to do with the server and resources.
Further links, external & internally
Remember, crawl demand is generally based on popularity or links. You can increase your budget by increasing the number of external links and / or internal links. Internal links are easier because you control the site. For suggested internal links, see the Link possibilities Report in Site Audit which also has a tutorial explaining how it works.
Fix broken and redirected links
Keeping links to broken or redirected pages active on your website will have little impact on your crawl budget. Usually the pages linked here have a relatively low priority as they probably haven’t changed in a while, but cleaning up problems is good for website maintenance in general and helps your crawl budget a bit.
You can easily find broken (4xx) and redirected (3xx) links on your website in the Internal pages Report in Site Audit.
For broken or redirected links in the sitemap, check the All problems Report for “3XX redirect to sitemap “and”4XX Page in sitemap ”.
To use RECEIVE Instead of POST where you can
This one is a bit more technical in that it is about it HTTP Request methods. Do not use POST Inquiries where RECEIVE Inquiries work. It basically is RECEIVE (pull) vs. POST (to press). POST Requests are not cached, so they affect the crawl budget RECEIVE Inquiries can be cached.
Use indexing API
If you want pages to be crawled faster, see if you have Google indexing permissions API. Currently, this is only available for a few use cases such as job postings or live videos.
Bing also has indexing API that is available to everyone.
What won’t work
There are a few things that are sometimes tried that don’t really help your crawl budget.
- Small changes to the side. Make small changes to pages, such as: For example, updating dates, spaces, or punctuation marks in the hope that pages will be crawled more often. Google is pretty good at determining whether changes are significant or not, so those small changes are unlikely to have any impact on crawling.
- Crawl delay directive in robots.txt. This instruction will slow down many bots. However, Googlebot doesn’t use it, so it has no effect. We at Ahrefs respect this. So if you ever need to slow down our crawling, you can add a crawl delay to your robots.txt file.
- Remove third-party scripts. Third-party scripts don’t count towards your crawl budget, so removing them won’t help.
- Do not follow. Okay, this one is dubious. In the past, nofollow links would not have used a crawl budget. However, nofollow is now treated as a notice so Google may be able to crawl these links.
I want google to crawl slower
There are only a few good ways to slow down Google crawling. There are some other adjustments that you could technically make, such as: B. slowing down your website, but I would not recommend these methods.
Slow adjustment, but guaranteed
The main control Google gives us to crawl slower is a rate limiter in the Google Search Console. You can use the tool to slow the crawl rate, but it can take up to two days for it to take effect.
Fast adaptation, but with risks
If you need a faster solution, you can take advantage of Google’s crawl rate adjustments for the health of your website. If you provide the Googlebot with the status codes “503 Service Unavailable” or “429 Too Many Requests” on pages, the crawling will be slower or the crawling will stop temporarily. However, you don’t want to do this for more than a few days or pages may be removed from the index.
Again, I want to reiterate that the crawl budget is nothing to worry about for most people. If you have any concerns, I hope this guide has been helpful.
I usually only investigate if there are problems with pages not crawling and indexing. I need to explain why anyone shouldn’t be concerned or I will see something that concerns me on the crawl stats report in Google Search Console.
Have any questions? Let me know on Twitter.