Skip to main content

How to Fix Crawl Errors 404 in Google Webmaster Tools


Looking at 12,000 crawl errors staring back at you in Webmaster Tools can make your hopes of eradicating those errors seem like an insurmountable task that will never be accomplished. The key is to know which errors are the most crippling to your site, and which ones are simply informational and can be brushed aside so you can deal with the real meaty problems. The reason it’s important to religiously keep an eye on your errors is the impact they have on your users and Google’s crawler.
Having thousands of 404 errors, especially ones for URLs that are being indexed or linked to by other pages pose a potentially poor user experience for your users. If they are landing on multiple 404 pages in one session, their trust for your site decreases and of course leads to frustration and bounces.
Webmaster Tools Crawl Errors
You also don’t want to miss out on the link juice from other sites that are pointing to a dead URL on your site, if you can fix that crawl error and redirect it to a good URL you can capture that link to help your rankings.
Additionally, Google does have a set crawl budget allotted to your site, and if a lot of the robot’s time is spent crawling your error pages, it doesn’t have the time to get to your deeper more valuable pages that are actually working.
Crawl Errors with Arrows
Without further ado, here are the main categories that show up in the crawl errors report of Google Webmaster Tools:

HTTP

This section usually returns pages that have shown errors such as 403 pages, not the biggest problems in Webmaster Tools. For more documentation with a list of all the HTTP status codes, check out Google’s own help pages. Also check out SEO Gadget’s amazing Server Headers 101 infographic on SixRevisions.

In Sitemaps

Errors in sitemaps are often caused by old sitemaps that have since 404’d, or pages listed in the current sitemap that return a 404 error. Make sure that all the links in your sitemap are quality working links that you want Google to crawl.
One frustrating thing that Google does is it will continually crawl old sitemaps that you have since deleted to check that the sitemap and URLs are in fact dead. If you have an old sitemap that you have removed from Webmaster Tools, and you don’t want being crawled, make sure you let that sitemap 404 and that you are not redirecting the sitemap to your current sitemap.
Image of Old Sitemaps and URLs 404'ing
From Google employee Susan Moskwa:
The best way to stop Googlebot from crawling URLs that it has discovered in the past is to make those URLs (such as your old Sitemaps) 404. After seeing that a URL repeatedly 404s, we stop crawling it. And after we stop crawling a Sitemap, it should drop out of your "All Sitemaps" tab.”

Not Followed

Most of these errors are often caused by redirect errors. Make sure you minimize redirect chains, the redirect timer is set for a short period, and don’t use meta refreshes in the head of your pages.
Matt Cutts has a good Youtube video on redirect chains, start 2:45 in if you want to skip ahead.
Redirect Chains Kills Crawlers
Google crawler exhausted after a redirect chain.
What to watch for after implementing redirects:
  • When you redirect pages permanently, make sure they return the proper HTTP status code, 301 Moved Permanently.
  • Make sure you do not have any redirect loops, where the redirects point back to themselves.
  • Make sure the redirects point to valid pages and not 404 pages, or other error pages such as 503 (server error) or 403 (forbidden).
  • Make sure your redirects actually point to a page and are not empty.
Tools to use:
  • Check your redirects with a response header checker tool like URI Valet or the Check Server Headers Tool.
  • Screaming Frog is an excellent tool to check which pages on your site are showing a 301 redirect, and which ones are showing 404 errors or 500 errors. The free version caps out at 500 pages on the site, beyond this you would need to buy the full version.
  • The SiteOpSys Search Engine Indexing Checker is an excellent tool where you can put in a list of your URLs that you submitted as redirects. This tool will allow you to check your URLs in bulk to see which ones are indexing and which ones are not. If your original URLs that you had redirected are no longer indexing that means Google removed the old URL from its index after it saw the 301 redirect and you can remove that redirect line from your .htaccess file now.
Examine your site in the text only version by viewing the cached version of the site from the Google SERP listing of the site, then select the text-only version. Make sure you can see all your links and they are not being hidden by Javascript, Flash, cookies, session IDs, DHTML, or frames.
Always use absolute and not relative links, if content scrapers scrape your images or links, they can reference your relative links on their site and if improperly parsed you may see not followed errors show up in your Webmaster Tools, this has happened with one of our sites before and it’s almost impossible to find out where the source link that caused the error is coming from.

Not Found

Not found errors are by and large 404 errors on your site. 404 errors can occur a few ways:
  • You delete a page on your site and do not 301 redirect it
  • You change the name of a page on your site and don’t 301 redirect it
  • You have a typo in an internal link on you site, which links to a page that doesn’t exist
  • Someone else from another site links to you but has a typo in their link
  • You migrate a site to a new domain and the subfolders do not match up exactly
Best practice: if you are getting good links to a 404’d page, you should 301 redirect it to the page the link was supposed to go to, or if that page has been removed then to a similar or parent page. You do not have to 301 redirect all 404 pages. This can in fact slow down your site if you have way too many redirects. If you have an old page or a large set of pages that you want completely erased, it is ok to let these 404. It is actually the Google recommended way to let the Googlebot know which pages you do not want anymore.
Redirect 404 Errors
There is an excellent Webmaster Central Blog post on how Google views 404 pages and handles them in webmaster tools. Everyone should read it as it dispels the common “all 404s are bad and should be redirected” myth.
Rand also has a great post on whether 404’s are always bad for SEO also.

Restricted by robots.txt

These errors are more informational, since it shows that some of your URLs are being blocked by your robots.txt file so the first step is to check out your robots.txt file and ensure that you really do want to block those URLs being listed.
Sometimes there will be URLs listed in here that are not explicitly blocked by the robots.txt file. These should be looked at on an individual basis as some of them may have strange reasons for being in there. A good method to investigate is to run the questionable URLs through URI valet and see the response code for this. Also check your .htacess file to see if there is a rule that is redirecting the URL.

Soft 404s

If you have pages that have very thin content, or look like a landing page these may be categorized as a soft 404. This classification is not ideal, if you want a page to 404 you should make sure it returns a hard 404, and if your page is listed as a soft 404 and it is one of your main content pages, you need to fix that page to make sure it doesn’t get this error.
How Soft 404's come to be
If you are returning a 404 page and it is listed as a Soft 404, it means that the header HTTP response code does not return the 404 Page Not Found response code. Google recommends “that you always return a 404 (Not found) or a 410 (Gone) response code in response to a request for a non-existing page.“
We saw a bunch of these errors with one of our clients when we redirected a ton of broken URLs to a temporary landing page which only had an image and a few lines of text. Google saw this as a custom 404 page, even though it was just a landing page, and categorized all the redirecting URLs as Soft 404s.

Timed Out

If a page takes too long to load, the Googlebot will stop trying to call it after a while. Check your server logs for any issues and check the page load speed of your pages that are timing out.
Types of timed out errors:
  • DNS lookup timeout – the Googlebot request could not get to your domain’s server, check DNS settings. Sometimes this is on Google’s end if everything looks correct on your side. Pingdom has an EXCELLENT tool to check out the DNS health of your domain and it will show you any issues that pop up.
  • URL timeout – an error from one of your specific pages, not the whole domain.
  • Robots.txt timeout – If your robots.txt file exists but the server timed out when Google tried to crawl it, Google will postpone the crawl of your site until it can reach the robots.txt file to make sure it doesn’t crawl any URLs that were blocked by the robots.txt file. Note that if you do not have a robots.txt and Google gets a 404 from trying to access your robots.txt, it will continue on to crawl the site as it assumes that the file doesn’t exist.

Unreachable

Unreachable errors can occur from internal server errors or DNS issues. A page can also be labeled as Unreachable if the robots.txt file is blocking the crawler from visiting a page. Possible errors that fall under the unreachable heading are “No response”, “500 error”, and “DNS issue” errors.
Unreachable Errors Make Man Sad
There is a long list of possible reasons for unreachable errors, so rather than list it here, I’ll point you to Google’s own reference guide here. Rand also touched on the impact of server issues back in 2008.

Conclusion

Google Webmaster Tools is far from perfect. While we all appreciate Google’s transparency with showing us what they are seeing, there are still some things that need to be fixed. To start with, Google is the best search engine in the universe, yet you cannot search through your error reports to find that one URL from a month ago that was keeping you up at night. At least they could have supplemented this with good pagination, but nope you have to physically click through 20 pages of data to get to page 21. One workaround for this is to edit the page number by editing the end of the URL string that shows what part of the errors list you are looking at. You can download all of the data into an Excel document, which is the best solution, but Google should still upgrade Webmaster Tools to allow searching from within the application.
Also, the owner of the site should have the ability to delete ALL sitemaps on the domain they own, even if someone else uploaded it a year ago. Currently you can only delete the sitemap that you yourself uploaded through your Webmaster Tools account. If Jimmy from Agency X uploaded an image sitemap a year ago before you let them go, this will still show up in the All Sitemaps tab. The solution to get rid of it is to let the sitemap 404 and it will drop off eventually but it can be a thorn in your side to have to see it every day until it leaves.
Perhaps, as Bing starts to upgrade its own Webmaster Tools, we will begin to see some more competition between the two search engines in their product offerings. Then one day, just maybe, we will get complete transparency and complete control of our sites in the search engines.

Comments

Popular posts from this blog

SEO Traffic v2.0.20041213

DOWNLOAD SEO Traffic v2.0.20041213 simplifies the process of generating search-optimized web pages by automatically creating a large number of them using premium keywords from Overture, Google, and Espotting. Instead of spending countless hours meticulously crafting keyword-optimized pages, why not let SEOT do it for you, completely free of charge? This tool offers the perfect balance of flexibility for professionals and simplicity for beginners. Say goodbye to manual labor and let SEOT Traffic handle your page creation effortlessly.

Understanding Online Work From Home Businesses

Millions of people are searching the internet daily to try and find the right opportunity to start a business at home making money online . Don't believe me?; just go to your favorite search engine and search for phrases like ' work from home ', ' work at home ', ' online businesses ' and things about working from home . You will see thousands of website choices in the results. If using Google.com you will see on the right panel lists of websites under the heading 'Sponsored Links'. Sponsored links are paid for on google by using a tool or website at www.adwords.com . Many marketers are paying a lot of money to advertise this way and rookies need to be careful as the costs can get out of hand. Sponsored links are a good indication of what the competition is doing and also of the viability of the product. The most popular method of making money online is through affiliate marketing. What this means is that you can become an a

What are the revenue sources?

There are two basic sources of income your website will have: Affiliate Marketing and Google Adsense. How you will get paid and from who? Income from Affiliate Marketing Companies that offer affiliate programs pay those who promote them in two ways. In the first case, upon the sale which comes from your affiliate link, your  commission automatically goes to your paypal account.  For example, if through the affiliate link one buys a book priced 20 USD and the commission is set at 25%, then, at the time of sale, you recieve in your Paypal account USD 5 and the remaining 15 goes to the owner's Paypal account . In the second case,  the payments are made after a short time . For example, the company "Commision Junction" manages the affiliate programs of thousands of companies. Among them are : TOSHIBA, HP, YAHOO, LEXMARK, DELL … So when someone promotes products of these companies and qualifies a commission, then periodically, o nce or twice a month , the company &q

Search Marketing Tips For Yandex, Russia’s Top Search Engine

Globally, Russia is the eighth largest market of Internet users, and as we know, Google is playing second fiddle to Yandex , which is currently the main search engine in Russia with well over half of the market share. Yandex makes hundreds of millions of dollars in revenue and provides a broad range of online services (email, free hosting, PPC advertising network (Yandex Direct), maps, news, weather, and dictionaries). Comscore ranked the web property first in Russia with 34.9 million unique visitors in August 2010 , It is also the 25th site in the Alexa Top 100. What’s more, Yandex is the default search provider in the Russian version of Firefox. Russian Language In Connection With SEO If your company is turning eyes to the Russian market, you might be considering  making a Russian-language version of your website to attract more local customers. You could have built your SEO strategy based on your experience in the English-speaking Internet, but if you’re building

GOMO: Create a Professional Mobile Site to your Blog & Website for Free

Some report says By 2013 more people will use their mobile phones than PCs to get online. 80% of mobile internet users abandon a site if they have a bad user experience. Mobile sites are designed for the small screen, with the needs of mobile users in mind. A mobile-friendly site can help your blog growth and connect with customers and increase sales. But a bad mobile experience can drive your customers to your competition. GOMO (Google Mobile)  partner with Duda mobile providing online tool to make your desktop website more mobile friendly with professional templates and free hosting. Duda mobile premium service free for one year(value $108) with unlimited email and phone support. How to: Step 1: Go to GOMO site through this link  www.howtogomo.com .  Then Enter your site URL in the specific column like below image. Then hit MAKE MY SITE MOBILE button.  Step 2: It redirects theme window, s elect your site theme from their. You can f

WORK AT HOME & MAKE EASY MONEY

Assuming that you've worked at an office five days (or more) for every week for more drawn out than a few years, its likely you fantasize about telecommuting. Perhaps you recently do telecommute infrequently, however are finding it hard to stay centered. I'm here to separate it into the most significant fundamentals for kicking off effectively. As a full-time independent author who recently passed her two-year commemoration, I've absolutely had my ups and downs regarding the matter of renouncing office (however not dependably weekend) warrior status. Gain experience from my accidents, go onward and flourish! Who, Who, Who Are You? Telecommuting isn't for every living soul. I don't say this to sound unrivaled or selective, daiquiri ice enhanced dessert isn't for everybody either. Whether its set to truly work for you is all about how you like to use your day, and your capacity to autonomously supervise your time. You may as well inquire as to whether you sup

5 Money Making, Recession Resistant Home Businesses

Ideas for Starting a Successful Business in a Down Economy In a recession, or in other times when the economy is lagging and jobs are being lost, most people consider it to be a very poor time to start thinking about starting a home business. However, tough economic times can also present great money making opportunities to those who prepare and are willing to take a chance on being successful. Starting a home business in the midst of a recession may seem scary, but not having a backup plan in case you should suddenly lose your job can be even more scary. Here are 5 relatively recession resistant, money making home business ideas to consider. 1. Internet Marketing Services Internet marketing services are in high demand in an industry that is growing by leaps and bounds. Anyone who has an online business presence needs to know how best to effectively market their website on the Internet and most will consult an Internet marketing firm for assistance. That&#

Adding a Neat CSS3 Dropdown Menu in Blogger

Presenting a remarkable dropdown menu with pure CSS3, originally created by Andrew from script-tutorials.com. I have made slight modifications to ensure its seamless integration into our Blogger template. Located in the upper right corner of this menu are the contact links and social media icons for Facebook, Twitter, Google Plus, and RSS feed. Directly below these links, you'll find the dropdown navigation menu, while the search form resides on the left side. To enhance the user interface and interaction, the CSS dropdown menu incorporates subcategories with elegant CSS3 effects such as box-shadow, text-shadow, and a smooth transition triggered by hovering over the parent link. Please visit this demo page to see it in action. Adding the CSS dropdown navigation menu in Blogger Step 1. Access your Blogger Dashboard and go to Template > click on the Edit HTML button Step 2. Click anywhere inside the code area and press the CTRL + F keys to open the search box. Type the tag be

STRATEGIES TO INCREASE WEB TRAFFIC

Site improvement ( SEO ) system and best practices change always as web crawlers like Google press on to take off new calculation overhauls, for example Panda and Penguin. The effect is that marks and advertisers can no more extended take alternate ways. It's all about building trust and driving quality site activity that will effect the end result. This originates from enhancing navigate rates for natural list items, which is the place the underutilized capacity of rich pieces becomes an integral factor. A rich scrap is basically only a minor outline of the information that a client can hope to see on a web site page. Rich pieces have no immediate impact on enhancing rankings, however they add huge worth to a natural posting in the web search tool results and can incredibly expand site activity. They can come in the manifestation of Google creator qualified data, appraisals and surveys, occasion notices and items. Rich pieces enhance navigate rates breathtakingly by en

10 Important Tips To Increase Fans On Your Facebook Page

The first direct marketing tool that one can think of today is Facebook. It’s gone from being just another social network, to being ‘The social network’. No just that, it’s now great medium for business. Simply speaking, this is because facebook is where people are. And if they are there, why not get them to notice who you are and what you do? These virtual profiles belong to real people, with real needs. So, here’s some quick ways to increase the number of fans on your business page: 1. Cater To Your Target Group The first rule is to know and target the people who could be interested in your products or services. So, everything you do should be for the people who will actually be your clients. 2. Start Activities Once you have your TG in mind and you obviously know all the plusses of your products, so start activities promoting the same. You need to start interacting or creating activities that people enjoy and like to participate in. A way to encourage part