- August 25, 2017
- January 13, 2016
When WordPress permalinks conflict with static .html pages in the web root, pages render for users but generate 404 errors for crawlers. Solve the issue in three steps:
- Go to the web root on the site’s FTP.
- Look for .html files from an outdated, static version of the site that share the same file path as current WordPress permalinks.
- Rename or remove the old .html files to resolve the “Soft 200” issue and get critical pages back in the index.
Origins of the “Soft 200”
In March 2015, Glenn Gabe explained the potentially catastrophic effect of “Soft 200s”—pages that render perfectly for users but return a 404 header status to bots and browsers. These pages get deindexed by search engines without providing visual clues to business owners or web design teams. Without crawling a site or checking the header status, the issue could go unnoticed for months—or years.
At the time, Gabe had a full roster of clients and was unable to take on more work; he diagnosed the issue, but its remedy remained elusive.
Just prior to Gabe’s post, we encountered the same issue on two critical subdirectories of a client site. Pages rendered, but the header status displayed a 404, and Google had dropped the pages from its index. With the help of developer Kamen Gordon and his team at Stovepipe, we were able to solve the issue. Because while Soft 200s are pernicious for search visibility, there’s a simple and effective cure.
Solving the Soft 200 Problem
Soft 200s occur because outdated .html files located in the web root match permalinks within WordPress. (The Soft 200 problem may also occur on other CMS platforms, but our experience and testing was in WordPress.) Older sites born as static sites and later migrated to a CMS are in particular danger. During that migration process, static files left in the web root that conflict with WordPress permalinks may generate a Soft 200.
In the case of our client, the issue was recent and represented a different complication. A handful of static .html files were left in the web root during a past transition to WordPress, but because the migration also restructured site architecture, post-transition permalinks did not immediately conflict with static .html files. Fast forward a few years, and further site reorganization generated new permalinks that conflicted with those long-forgotten static files.
Resolving the issue was as simple as moving all outdated .html files into an archive folder, where they wouldn’t conflict with existing—or future—permalinks. Deleting the files was another viable but unnecessary option.
Technical Details and Enduring Mysteries of Soft 200s
We traced at least some of the issue back to Apache configuration. The default configuration in Apache prioritizes crawling of index.html before index.php. This means that the server looks first for the .html version of a page before its .php equivalent. While this configuration can be changed with the DirectoryIndex directive in the .htaccess file, there was no DirectoryIndex directive in the client’s .htaccess file. (No changes were made to the .htaccess file to resolve the issue.) This meant that browsers and crawlers encountered the outdated .html versions of pages before finding their .php equivalents through WordPress.
What we haven’t solved is exactly what happens when a request hits the server and finds the .html file with a matching permalink in WordPress. The ability of the browser to render the page despite the 404 header status and bots’ inability (or unwillingness) to do so may reflect a greater capacity by browsers to find the requested file, or that bots are programmed not to progress past a 404 error. (It’s easy to understand why a search engine would not risk indexing and serving a page with a 404 header status.)
I emulated the issue on a (much-neglected) personal domain. Simply adding a file named “about” at the web root generated a 404 header status while still rendering the /about/ page for visitors. (See image above.) Renaming or removing the file made the issue disappear just as quickly.
A couple of caveats to this mini-experiment: I was able to generate the 404 error only without the .html extension, which, for me, caused a 500 server error instead of a 404. This differs from what Stovepipe found and fixed, which was a group of .html files generating 404 errors. (Those files may have been corrupted, or the client site may have had different Apache settings outside the .htaccess file.) The test site was also SSL; the client site that experienced the error was not secure.
Thankfully, diagnosing and solving the Soft 200 issue doesn’t require understanding the nuances of how servers handle these requests. Instead, Soft 200s are an opportunity to provide immediate value through technical SEO and get a quick and powerful win for clients. And I’ll take as many of those as I can get.