Rather than being a true 404 as defined by the HTTP response, a soft 404 is when a page is actually a different response code but is treated like a 404.
How Does This Differ From A ‘Regular’ 404?
A soft 404 is most commonly a 200 response page which will act as a custom 404 page. Sometimes it can be classed as a 404 by crawlers if the page exists but has little or no content on the page – such as empty pages, old categories or template pages.
This can be a bit confusing if you aren’t familiar with response codes and how they are read, but Google put it like this:
“This is like a giraffe wearing a name tag that says ‘dog’. Just because the name tag says it’s a dog, doesn’t mean it’s actually a dog. Similarly, just because a page says 404, doesn’t mean it’s returning a 404-status code.”
So, if you image a page as having 2 instances of showing it is a 404 page – one to the user and one to a crawler.
404 Error – looks like a 404 page to users, and crawlers see a 404 HTTP response.
Soft 404 Error – may or may not look like a 404 page to users, and crawlers see a 200 response code but it thinks that the page should be a 404.
There are numerous ways for Soft 404s to emerge, but the main two causes are:
Incorrect Custom 404 Response
As mentioned above, if 404 pages on your site show or redirect to a custom 404 page, you might be serving a 200 response page – and I don’t need to remind you that 404 and 200 are different numbers!
You can still have that custom 404 page, but it needs to have a 404 response! This can be checked by various tools and plugins, such as this redirect path plugin, the list mode in Screaming Frog or services like https://httpstatus.io/.
Moz have a blog on helpful robots.txt directives – with number 5 showing how to set up a custom 404 error page here.
These are often automatically generated by certain CMS’s – with an obvious one being WordPress.
If you write a blog and add a new tag, it will create a page by default for all posts with this tag. If the page is created but there are no actual posts, then it will have created an empty page.
Soft 404 Issues
You have some Soft 404s in Search Console, but does it matter for SEO?
Short answer – yes. Long answer – also yes.
404s are not damaging when they are correctly labelled, as you will naturally have old products, old posts and other pages which are not used anymore. But if you start getting Soft 404s on pages which should be indexed, crawlers will get confused and not know were to serve your pages in results.
This mostly boils down to the crawl budget of a site. This is how often your site gets crawled and how many pages get crawled. For example, your 4000 page site might get fully crawled on a monthly basis, but your homepage could be crawled daily if it is deemed important enough.
Crawls cost time and energy, so it makes sense for search engines to crawl the more important pages more often and to leave those obscure category pages for a quieter time.
When we put crawl budgets and Soft 404s together, we end up with crawlers which see 200 pages and crawl them, but in reality they should be 404 pages. This eats into the crawl budget and creates needless tasks and will leave some actual 200 pages un-crawled, thus reducing visibility for your pages.
Now, this might not be an issue for a website with a dozen or so pages, but if a larger site starts getting Soft 404s, it can quickly multiply and create a large crawl budget issue.
Fixing Soft 404 Errors
Firstly you need to identify the pages marked as Soft 404s. If there are less than 1000 URLs then you can download these straight from Google Search Console. If there are more, then you will have to either wait to fix the first 1000 and work on the next, or use a third party tool which can download these through Google’s API.
Then you will need to find out the actual response code of these pages. As mentioned previously there are plugins, websites and third party tools which can help you do this en-masse, with the easiest being everyone’s favourite – Screaming Frog.
Once you have your list of URLs they will need to be assigned to one of three groups:
The page is no longer available – For pages with no direct replacement, it should return a 404 (not found) response code. This clearly tells browsers and crawlers that the page doesn’t exist anymore. It can display a custom 404 page, but this should be done through the robots.txt file method mentioned previously, not a catch-all redirect.
The page has moved – The URL should be redirected to the clear replacement through a 301 redirect. Other redirect responses can be used, but the general rule is to use a 301 unless you know specifically to use another.
The page is incorrectly marked as a Soft 404 – This is when pages are thin or at least deemed thin by search engines. Most of the time this will be for pages which have little value, or perhaps duplicate content such as blog tag pages, filtered pages, etc. This should be handled by canonicals and robots.txt disallows.
If you still see pages which you deem to be important enough to be indexed, then you will need to improve these pages through improved content and linking.
Once you have cleared down you Soft 404 errors, crawlers will have a clear path through to your important content without wasting their crawl budget and your bandwidth.
The improvements will be more marked the bigger the site, and the bigger the amount of Soft 404s fixed.
Ignoring anecdotal evidence, the jumps in traffic which we have seen have been primarily on ecommerce sites with changing products, category filters and other options which help create thin pages and incorrect redirects.
Unfortunately, this is an area which is often overlooked by SEOs and will not be a consideration for many developers. It is a relatively easy fix from a technical standpoint, with the main bulk of work being the analysis and identification of these errors.
It is recommended that you try navigating to a 404 page on your site and see what the response code actually is. If you’ve got a 200 response and you’re unsure of the next steps, get in touch with us to see how we can help.