blog

You might be hurting your SEO with a GitHub Pages SPA

In theory, GitHub Pages is a great way to host the frontend of your single-page application, built in React, Angular, or any number of similar solutions. It is fast, easy to configure, and comes with automation capabilities to build your application directly from source. While even a cursory amount of research into the topic will reveal a critical flaw and a convenient workaround solution, this solution could end up hurting your SEO in the long run.

GitHub Pages Does Not Natively Support SPAs

GitHub Pages lacks a direct way to rewrite incoming traffic to a single file (your index.html), a key step in configuring a single-page application without using query parameters. With another hosting solution like apache, one could use htaccess to accomplish this, but with GitHub Pages, there are two possible workarounds you'll need to choose.

Option 1 is to use query parameters instead of direct urls. For example, instead of your urls looking like this: example.com/posts/5, they would instead look like this: example.com/#/posts/5 or example.com/?posts/5. Doing this avoids the issue as the base HTTP request is being made to the index, with the url you want your single-page application to handle being passed as a parameter to that request. This works fine, but requires some additional setup both to configure the routing of your SPA and tell search engines how to index your site.

Option 2 is specific to the way GitHub Pages handles a page not being found. If you have a file located at /404.html on your GitHub Pages site, that page will be served instead of a generic error page. If you have tried to solve this problem yourself, copying your index.html to 404.html is likely the solution you will have been recommended by countless medium articles, Continuous Integration scripts, and GitHub issues on the topic.

The Problem

The hidden caveat to this solution is that while GitHub will indeed now serve your single-page application on every request that would otherwise result in an error page, internally it still believes it has encountered an error. As such, it will send a 404 status code along with your page, which will be invisible to your browser and even to certain tools like curl without enabling verbose output. I myself only found this out recently, when diagnosing issues indexing pages reported by Google's Search Console. A page on one of my single-page applications was linked to on another website, and Google reported that it tried to index the URL for its search engine but was unable to, as the URL resulted in a 404 error code.

At first I thought I could use Cloudflare, which I already had set up, to rewrite the 404 errors on these specific pages to 200 as there are only a few, but sadly the platform lacks this feature. It seems the only true solutions are to wait for GitHub to expand their configuration settings for Pages, or switch to a different hosting solution as I did.


Leave a Reply

All comments undergo moderation before being shown.