How To Optimise For Crawl Budget

Generally speaking, there are 4 key ways in which you can maximise crawl budget for your website. We discuss each of these in detail in this post.

How To Optimise For Crawl Budget

Now that you’ve read What is Crawl Budget and When Should You Worry About It, you understand what crawl budget is and why it may be something your business needs to think about. Now, you probably want to know how to optimise for it. In the second part of this 2-part blog series, we’ll offer some expert advice to help you understand what you need to know.

How Can I Maximise Crawl Budget?

There are a number of different aspects of your website performance and structure that can cause crawl budget issues. Generally speaking, there are 4  key ways in which you can maximise crawl budget for your website:

  1. Ensure that you have an XML sitemap
  2. Improve site speed
  3. Optimise Site Architecture and internal linking
  4. Limit unnecessary URLs

Ensure you have an XML sitemap

An XML sitemap is a file on your website that feeds search engines data on the pages of the site as well as the crawl priority or hierarchy of site content - in other words it tells search engine crawlers what you want them to look at on your site.

Whilst having an XML sitemap does not guarantee that all your pages will get crawled and indexed, it does increase your chances, particularly if your site architecture and internal linking are not optimal.

A sitemap should be created at the root level of your site (usually mysite.com/sitemap.xml) and you should submit it to major search engines e.g. Google via Google Search Console and Bing via Bing Webmaster tools.

Improve Site Speed

This is a pretty self explanatory one. The faster your pages and links load, the faster a crawler can crawl and index them. By reducing the delay between when a Googlebot visits a page and when it can move to another, you are increasing the chances of that Google bot crawling and indexing more pages.

Ensure that your site content loads quickly by:

1. Using a fast web hosting service

When it comes to site performance, your web hosting service makes a huge difference. It can be tempting to go with the cheapest possible option for web hosting, particularly for a new website, but don’t forget to upgrade as your site begins to get more traffic.

Whilst a basic shared hosting package may be fine to start out with, once you start generating a lot of traffic to your site, you will want to consider a more robust hosting option like VPS or dedicated hosting.

2. Optimising your images

Images that are poorly optimized will tank your site’s page load speed. URLs that load oversized images will often be scaled by the browser. Scaling images in the browser is bad for performance as it takes extra CPU time and the user ends up downloading data they don't use.

You should have an upper limit for images on your site - the sweet spot here varies by site type (e.g. e-commerce sites will want to prioritise image quality a little more) but here at adaptive we recommend using an upper target of 200kb max.

All images should be compressed and optimised for web before uploading to your site. Photoshop and other common photo editing tools have "Save for web" options or similar.

3. Using a Content Delivery Network (CDN)

Using a Content Delivery Network (CDN) is a good option, particularly if you have a lot of traffic from a wide geographical range - a CDN is essentially a global network of servers on which you can cache your site content. When a user requests files from your site, that request will be routed to the closest server.

4. Minimising HTTP requests

A HTTP request is made each time an element on your page is downloaded e.g. an image, stylesheet, script, etc.

Some CMSs and themes are notorious for bloating your page templates with scripts. Each additional tracking tool, 3rd party integration or other “plugin” will add to this bloat.

Make sure that your site’s code is as lightweight and efficient as possible and only add bulk (e.g 3rd party plugins) where they add significant value to your user.

5. Minifying and combining files

HTML, CSS and JavaScript files are pivotal to how your site appears and behaves.

However they also add to the amount of HTTP requests that your site needs to make each time a user views a page.

You can reduce bloat by minifying and combining files where appropriate.

Minifying files involves optimisations like removing unnecessary commas, spaces, other characters, code comments, stale code and formatting.

Combining files is exactly what it says on the tin - if you have multiple CSS or JavaScript files running on your site, you can sometimes combine these to reduce the amount of HTTP requests required

6. Using Asynchronous Loading for CSS and Javascript files

Scripts like CSS and JavaScript can be loaded in two different ways: Synchronously or Asynchronously.

Asynchronous loading involves downloading and applying a page resource in the background, independently of other resources rather than one at a time, in the order they appear on the page.

Loading files asynchronously can improve pagespeed by ensuring the browser loads your page from top to bottom.

7. Enabling browser caching

When a user visits your site, their browser cache will save information and data, including images and HTML that are necessary for that user to see your site. This means that the next time that user visits your site, their browser can load the page without having to send another HTTP request to the server.

There are different ways to set up caching depending on how your website has been built.

8. Enabling lazy loading

Lazy loading is the practice of delaying load or initialization of resources or objects until they’re actually needed e.g. content that is “below the fold”.

By implementing lazy loading which under-emphasizes content below the fold, the user will not need to wait to access the page and initial page load times will be significantly reduced.

Again, lazy loading may be implemented in different ways depending on how your site has been built.

9. Reducing redirects

Redirecting one URL to another is appropriate in many situations. However, if redirects are done incorrectly, it can lead to disastrous results. Two common examples of improper redirect usage are redirect chains and loops.

Long redirect chains and infinite loops lead to crawlers wasting a lot of crawl budget and eventually just giving up on crawling that chain (Google says they will give up after 5 redirects in a row)

Review your site for redirects with a tool like Screaming Frog and reduce redirect chains by redirecting all links in each chain to the final destination url separately.

Optimise Site Architecture and Internal Linking

Googlebot and other search engine crawlers find new content by following html links from pages they have already crawled on your site. You can make it easier for Google to quickly and efficiently find and crawl all of your pages by ensuring that you have:

  1. A logical site architecture
  2. Logical internal linking structures to match this site structure
  3. Optimised Pagination

1. Logical Site Architecture

A logical site structure has many, many benefits for SEO. It helps Google find all your pages  it helps to spread link equity (page authority) throughout your site and; most importantly in the context of this article, it ensures that crawlers can efficiently find all of the pages on your site without working too hard.

There are 5 steps to ensuring a logical site architecture:

  1. Plan out your hierarchy before developing your site. Card sorting exercises are useful at this stage - write down the topic of every page you plan to include on your site (keyword research is always a useful exercise prior to this point), then “bucket” those topics into categories. You should end up with 3-7 main categories and try to ensure you have evenly balanced subcategories below that (and not too many).
  2. Create a URL structure that follows your site hierarchy - mydomain.com/category-a/sub-category-a/my-page is generally better than mydomain.com/my-page. For example if your hierarchy looks like that in the image below, the URL structure for the Product A page would look like this: www.mysite.com/products/product-a
  3. Create your navigation in HTML or CSS (Not Javascript/Flash/etc.) - coding of navigational features like menus should be kept simple - crawlers may find it difficult to find new pages via Javascript / Flash / Ajax navigation structures
  4. Use shallow depth navigation structure if possible, pages should not be buried too deeply within your site - shallow sites work better in terms of UX and also SEO.
  5. Develop logical internal linking structures to match your site architecture...

2. Logical Internal Linking Structures

Internal links are links that go from one page on a domain to a different page on the same domain. This includes:

A. Main navigation links (e.g. links in your main menu)

B. Secondary or other navigation links (e.g. sidebar menu links within certain sections of your site)

C. Hyperlinks within your content

Rather than relying on linking related content with hyperlinks alone (c) you should ensure that:

  • - All primary sections in your IA are linked to from your main navigation (a)
  • - All  related content within these sections (e.g content in the same section / products in the same category) are automatically clustered together via internal linking structures like “In This Section” or “Related Services” secondary navigation structures like the example (b) above at a site section level

3. Optimised Pagination

Websites with a lot of content often use pagination so that they can quickly and easily provide content to users. For example, a category landing page or a product list page may be split out into multiple pages, each with a manageable amount of content or links (e.g. links to products or blog posts).

Issues arise with pagination when key content is difficult for robots to reach e.g. when:

  • - A single category or product list is extremely long and is the only entry point for its “child” content (i.e. the other content it links to)
  • - The pagination implementation does not allow users to click further than a few pages down the list (apart from the last page) - for example, B below is preferable to A.

You can avoid paginated-related crawling issues by:

  1. Segmenting your category / list pages into subcategories with more manageable amounts of content (an added benefit of this approach is that you now have optimisable landing pages for longer tail search terms)
  2. Offering more items per page, thus reducing the amount of paginated pages and therefore the click depth of content on the deeper paginated pages
  3. Optimising your pagination implementation to include links to more of the paginated pages (e.g. example B above).

Limit unnecessary URLs

Google does not look at “pages” in the same way as many users do. A googlebot looks at URLs and unless you tell it to behave otherwise, it will treat each unique URL that it finds as a unique page with content worthy of crawling and indexing. However, very often, certain CMSs will create a lot of unnecessary URLs that do not necessarily contain unique, useful content for the end user. These “unnecessary URLs” should be minimised. They come in many forms including:

  1. Navigation Filter URLs
  2. Tracking Parameter URLs

1. Navigation Filter URLs

Navigation Filter URLs are designed to narrow down items within a site’s listing page and display this “filtered” information to users. For example in an eCommerce site a user can apply a colour filter to a products listing page, which will narrow down the list of products on the page, while appending parameters to the base URL.

If not managed correctly, this type of filtering functionality can cause serious crawl budget issues by creating near duplicates of important pages. If these filtered urls are accessible to robots, and particularly if those filters can be combined, then crawlers will keep finding endless new filter combinations, leading to thousands more pages to crawl. Ultimately, this will lead to the bots to waste crawl budget that should be reserved for genuine, unique URLs.

Adding a canonical tag can stop the indexing of a page with a dynamic URL but it will not stop it from being crawled. Therefore, it is best practice to ensure that URLs with filters that do not result in unique content with organic traffic potential are tagged with a nofollow attribute. How you do this will depend on your CMS and other site details so speak to your developer to identify the best approach for you. Specific filter parameters can also be blocked via the parameter tool in Google Search Console.

2. Tracking Parameter URLs

Tracking parameter (UTM parameters) are short strings of text that you add to URLs (e.g. in adverts or affiliate links) to send data on their usage back to 3rd party tools e.g. to help you track the performance of a specific ad campaign in Google Analytics.

url parameter elements

If you use a lot of different tracking parameter URLs (and combinations) to drive traffic to your site, then you will end up with what Google interprets as multiple different versions of the same page. In the worst case scenario Google will see all of these as individual pages that are worthy of crawling and indexing.

Again in this scenario, you should ensure that URLs with filters that do not result in unique content with organic traffic potential are tagged with a nofollow attribute and specific filter parameters can also be blocked via the parameter tool in Google Search Console.

In Conclusion

While Google may downplay the significance of crawl budget for most site owners, in our experience it is something that you should monitor, especially if your business maintains an e-commerce store or another type of large and potentially complicated website.

To ensure that your site’s indexability is not negatively affected by crawl budget issues, you should ensure that it is fast, logically structured with comprehensive internal linking and that it is not creating excessive numbers of indexable duplicate content via dynamic URLs.

If you suspect that Google is finding it difficult to find, crawl and rank some of your content, feel free to get in touch to discuss an SEO audit.