How to Perform Your First Log File Analysis
Let’s first look at how Google’s bots analyze a website. Then, we’ll take a closer look at the log file analysis and how you can perform your first one today.
- How does Google crawl your site?
- What is a log file analysis?
- Performing your first log file analysis
How does Google crawl your site?
Google is an incredibly far-reaching amorphous entity that is constantly searching the dustiest corners of the web to document every available site. To keep their database current, and its algorithm meeting the needs of users, Google needs to consume and catalog the entire internet regularly.
To do this, they need impossible manpower. Enter Googlebot. Googlebot is just what it sounds like: a robot (well, a collection of robots). These bots are known as web crawlers, built and used by Google to find and evaluate content all over the World Wide Web.
When a Googlebot crawls a website, it takes in all of the relevant data it can find: text, pictures, graphics, metadata, header tags, etc. Then, the bots place all of that information in a catalog for your site-- a kind of file that Google references when making algorithmic decisions.
Using the information gleaned by its bots, Google evaluates the relevancy of your site and web pages. They do this with a complex and ever-changing algorithm that evaluates the usefulness of your site for various queries. But, while the algorithm itself is complex, its purpose is not. Google wants to stay in business. And, in the simplest sense, they do that by continuing to answer the search queries of users better than any other competitors. By focusing your attention on best meeting the needs of your ideal customers on your site, you will fight side by side with Google’s algorithm rather than against it.
Google has a lot to do. It’s bots can’t spend all day on your site just because you’d like them to. They will give a limited crawl budget to your site when they locate it, and it is up to you to make the best of that time. Relevance and keyword rankings are determined by these crawls, so be sure that your SEOs know how to maximize the limited time Google allocates to your site.
This limited budget stress is where the log file comes in handy.
What are Log Files?
Log files are records created by servers that document activity such as visitor behavior and server actions on a website. These files are automatically generated and provide a chronological record of events, capturing data like IP addresses, page requests, response codes, and timestamps. Essentially, log files serve as a diary for a website, meticulously noting every interaction that occurs between a server and its visitors.
Why Log File Analysis is Important for SEO
Log file analysis is a critical aspect of SEO, providing insights into how search engine bots interact with a website. By analyzing these files, SEO professionals can understand a site’s crawl behavior, identify potential issues, and make informed decisions to optimize their search engine performance. This analysis reveals the frequency of crawling, the pages prioritized by search engines, and any obstacles that bots may encounter. Understanding these elements is key to enhancing a website’s visibility and performance in search engine results.
What is a log file analysis?
A log file analysis is the investigation of existing log files which should provide the insights needed to:
- Understand Googlebot’s priorities and behavior while crawling your site
- Identify any issues Google has crawling the site
- Provide an action plan to resolve those issues and optimize your site for prime crawlability
The log file analysis has three steps: data gathering, analysis, and implementation. I’ll walk you through each element to show how each phase feeds into the next.
Performing your first log file analysis
Gathering the data
Before you begin the log file analysis, you need to be sure you’re looking at the correct data. Use Screaming Frog Log File Analyzer to help you locate the right information. Here’s what to look for:
- 1-3 months of access logs from the domain being analyzed: 1-3 months' worth of past website log file data will give you an idea of Google’s most recent and relevant crawl behavior for your site. Log files are typically stored on the server where your website is hosted. Access can be gained through a server’s control panel, via FTP (File Transfer Protocol), or by requesting them directly from your hosting provider. It's important to ensure that you have the correct permissions to access these files. Once obtained, these logs are often in a raw format, such as .log or .txt files, and can be quite large depending on the website's traffic and server configuration. If you are using Screaming Frog Log File Analyzer to run the actual analysis as well (which we recommend), you’ll need the access log files to be in the following formats:
- WC3
- Apache and NGINX
- Amazon Elastic Load Balancing
- HA Proxy
- JSON
- Screaming Frog crawl data: This data will be overlaid with the log file crawl data to match up things like rel=“canonicals,” meta robot tags, and other URL-specific data. Having a range of data will help tell a complete story on how Google is crawling your site, thus leading to more informed recommendations.
- Google Analytics data: This will also be overlaid with the log file crawl data as a way to see how the most conversion-heavy pages are being crawled by Google. It will also contain session data that will help us understand the implications of Google’s crawls on your site.
Once you have gathered the pertinent website logs and data, you’ll be able to move on to the actual analysis.
Analysis
To analyze all this data I use the following toolset:
- Screaming Frog Log File Analyzer: This is the core tool we use in the log file analysis. Here’s a great intro guide on what this tool is and how to use it.
- Screaming Frog SEO Spider: This is what we’ll use to extract the URL-specific data for the site being crawled.
- Google Sheets or Excel: This is where we’ll be doing our data manipulation.
In executing the log file analysis, here are a few things to look for:
- Are there any subfolders being over/under-crawled by Googlebot?
- To find this go to the Screaming Frog Log File Analyzer: Directories, with special attention given to the crawl numbers from Googlebot.
- Are your focus pages absent from Google’s crawls?
- To find this go to the Screaming Frog Log File Analyzer: URLs. If you have Screaming Frog SEO Spider data coupled with the log file data you can filter down the HTML data with the view set to ‘Log File.’ From there you can search for the focus pages you want Google to care most about and get a feel for how they are being crawled.
- Are there slow subfolders being crawled?
- To find this go to the Screaming Frog Log File Analyzer: Directories. You’ll need to sort by Average Bytes AND Googlebot AND Googlebot Smartphone (descending) so that you can see which subfolders are the slowest.
- Are any non-mobile-friendly subfolders being crawled by Google?
- To find this go to the Screaming Frog Log File Analyzer: Directories. You’ll need to sort by Googlebot Smartphone to see which pages aren’t getting crawled by that particular Googlebot, which could be an indication of a mobile-friendliness issue that needs to be addressed.
- Is Google crawling redundant subfolders?
- To find this go to the Screaming Frog Log File Analyzer: Directories. As you examine the subfolders listed therein, you should be able to see which directories are redundant and require a solution to effectively deal with them.
- Are any 4XX/302 pages being crawled by Googlebot?
- To find this go to the Screaming Frog Log File Analyzer: URLs. Once you identify the broken pages Google is hitting you’ll know which ones require higher priority to 301 redirects.
- Is Google crawling any pages marked with the meta robot no-index tag?
- To find this go to the Screaming Frog Log File Analyzer: URLs. You’ll need to sort by ‘Indexability,’ then by ‘Googlebot,’ and ‘Googlebot Smartphone’ to get a feel for which pages are marked as no-index but are still getting crawled by Google.
- Are the rel canonicals correct for heavily crawled pages?
- To find this go to the Screaming Frog Log File Analyzer: URLs. This is where you can see if the rel canonicals on the pages getting crawled the most have the correct rel canonical URLs.
- What updates to the robots.txt file/sitemap.xml are needed to ensure your crawl budget is being used efficiently?
- Based on what you find in your analysis, you’ll be able to identify which subfolders or URLs you’ll need to disallow (robots.txt), remove, or include in the sitemap so you’re sending the clearest possible signals to Google regarding which pages you want crawled.
Implementation
In answering these questions you’ll gain valuable insights on what may be holding back your website’s performance and how you can improve it. But, the journey doesn’t stop there. Once you have these insights, you must work to implement them. You’ll want to build out a list of items that need tackling, how you plan to implement those changes, as well as a plan to improve the crawlability of your site going forward.
Some of the items we’d recommend you focus on include:
- Configuring and improving how Google crawls your site
- Using the robots.txt to disallow sections of the site we’re seeing Google take time on that don’t need to be crawled
- ID additional technical SEO fixes for the site
- Updating meta robot tags to better reflect where you would like Google to focus its crawl budget
- Broken pages
- Building 301 redirects for 404 pages that the Google bot is consistently hitting
- Duplicate content
- Building a content consolidation game plan for redundant pages that Google is splitting its crawl budget on
- This game plan would involve mapping out which duplicate/redundant pages (and even subfolders) should either be redirected or have their content folded into the main pages being leveraged in the site’s keyword-targeting strategy
Once this list of recommended changes has been built, you’ll need to work with your web development team to prioritize your next steps. I recommend rating each item on a scale of 1-5 on these three categories:
- Difficulty to implement
- Turn-around time
- Potential for SEO yield
Once the priority has been established, you’ll work with your web development team to implement these fixes in a manner that works best for their development cycles.
Ready for some results?
Sounds like a lot of work, but it’s worth it. To show you just how important this analysis can be, here’s a brief case study that demonstrates the impact a log file analysis can have on an SEO strategy.
During a recent client engagement, we were working to increase e-commerce transactions brought in from Google organic traffic.
We began the journey as we generally do, by performing a series of technical audits. As we examined Google Search Console, we noticed that there were some indexation irregularities. Specifically, pages were missing in Google’s indexation and overall coverage of the site. This is a common symptom of a crawlability issue.
So, we ran a log file analysis to identify ways we could improve how Google crawls the site. Some of these findings included:
- Several redundant subfolders being crawled by Google
- Broken pages missed in our initial site audit that need to be redirected
- Various subfolders that Google was spending time crawling that didn’t play a role in our SEO keyword ranking strategy
We created an action plan based on these findings and worked with a web development team to ensure they were addressed.
Once the log file findings were implemented, we saw the following results (comparing 30 days with the previous 30 days):
- e-commerce transactions increased by 25%
- e-commerce conversion rate increased by 19%
- Increase in Google organic e-commerce revenue by 25%
As with all SEO strategies, it’s important to make sure Google acknowledges the changes you’re making and rewards your site accordingly. Running a log file analysis is one of the best ways you can make sure this happens, regardless of the other technical SEO fixes you are implementing.
FAQ
Log file analysis in SEO refers to the process of examining server logs to understand how search engine bots interact with your website. This analysis helps in identifying crawl patterns, indexing issues, and opportunities for optimizing search engine visibility.
Log file analysis is crucial as it provides direct insights into how search engines are crawling and indexing a website. This understanding is key to improving a site's SEO performance by ensuring that search engines can effectively access and evaluate all relevant content.
The frequency of log file analysis depends on the size and complexity of the website, as well as the dynamics of its content updates. For most sites, a monthly analysis is sufficient, but for larger, more dynamic sites, a more frequent analysis may be necessary.
The main challenges include dealing with a large volume of data, extracting relevant information, and interpreting the technical data in a meaningful way for SEO improvements.
Yes, log file analysis can also be used to identify potential security issues, such as unauthorized access attempts or suspicious activities, as it records all server requests.
Popular tools for log file analysis include Screaming Frog Log File Analyzer, Splunk, and Logz.io. These tools vary in features and complexity, catering to different analysis needs.
While Google Analytics provides user behavior data, log file analysis offers insights into how search engine bots interact with your site. Both provide valuable but different perspectives for SEO optimization.
Yes, small businesses can benefit from log file analysis as it can uncover fundamental SEO issues that might be hindering their online visibility, regardless of their website's size.
By analyzing log files, you can identify slow-loading pages or server issues, allowing you to make optimizations that can improve overall site speed and performance.
The best way to learn is through a combination of online resources, such as blogs and webinars, and hands-on experience with log file analysis tools and real data sets.