Hosting a Personal Website on Google Cloud Storage: Some Ops Lessons

2020-05-12

This site is served off of a Google Cloud Storage static bucket. That makes it very easy to set up and deploy my website using Hugo, which can accept a GCS bucket URL as the deploy target. It's also extremely cheap. You pay per GB stored, and per GB delivered in network requests. For a static website consisting of just HTML and CSS, your storage and bandwidth costs are very low. For now, this is running me a few nickels per month.

HTTPS

TL;DR: just use Cloudflare

One thing that might be important to you is making your website available over HTTPS. I thought to myself, I've done this before, and I went to ask my friend certbot for help.

The LetsEncrypt HTTP challenge has you create a text file on your domain, available over HTTP at the path /.well-known/acme-challenge. However, it turns out that you cannot create a GCS bucket with this name! The fine folks in the comments of this StackOverflow answer have an explanation: the domain name of your bucket is actually my_bucket.storage.googleapis.com. Clearly I shouldn't be able to obtain a certificate for this Google-owned domain.

(Un)fortunately, that StackOverflow answer also provides a solution: use a DNS challenge instead. The corresponding certbot command is

certbot -d example.com --manual --preferred-challenges dns certonly

Certbot asks you to install a TXT record in your domain's DNS. This takes a bit to propagate, but not too long. At first I was checking with the wrong dig command. Make sure you specify that you're interested in a TXT record:

dig -t TXT  _acme-challenge.example.com

The DNS record soon propagated, and I got a shiny new LetsEncrypt cert for my site. Now I had to install it in my GCS bucket somehow. I thought this step would be easy.

However, I soon found out from this other StackOverflow answer that it is not. As detailed there, a GCS bucket does not support SSL for a domain pointed at it via CNAME. People are understandably disappointed that this is not possible, but let's give Google the benefit of the doubt for a minute. SSL termination requires at least a little more CPU than just serving files over HTTP, so it may be that the machines running GCS are not suitable for serving HTTPS traffic.

Now, they probably have frontend servers to terminate the incoming HTTP requests, followed by a short-hop API request to fetch the actual data. But cert management starts to be a bit of a headache: the frontend server would need to load the certificate in order to terminate SSL. If that certificate is in a bucket, then you'd be asking GCS to inspect your private data. Or you upload the cert through an admin console, and then select the bucket against which to authorize requests? This sounds like a load balancer. It's not hard to game out why Google might have decided none of this was worth the trouble, particularly when you consider how much I'm willing to pay them for the privilege.

Instead of serving HTTPS from Google Cloud storage, you have to use Google Cloud Load Balancer. Napkin math tells us that the pricing for Load Balancing will work out to at least

\begin{equation*} \frac{$0.025}{\text{hour}} \times 24 \frac{\text{hour}}{\text{day}} \times 30 \frac{\text{day}}{\text{month}} = \frac{$18}{\text{month}}, \end{equation*}

which is way more than I intend to spend on this. You could run your own nginx server for less, to do load balancing, SSL termination, or just to serve your website's pages itself.

In any case, I ended up asking Cloudflare to take care of this for me, which was very easy. A couple of lessons from that:

Access Logs

In order to get some very basic statistics about who is visiting the site, I didn't want to install Google Analytics. It seems like overkill, and under no circumstances would I want to run an ad, or help anyone else do that. Instead, I opted to do the old-school thing of just doing an analysis of the server logs. And rather than shave that yak myself, someone at lobste.rs pointed me to GoAccess, which is a nice, fast terminal based log analyzer. The access logs are set up, per the instructions, to go to a GCS bucket alongside this website's.

Once you have access logs appearing in your logs bucket, it's a simple matter to pull them down and run the analyzer:

#!/bin/bash

# Copy all the logs down to your machine
gsutil -m cp -R gs://logs.example.com .

# Skip the first line of each usage log file
awk 'FNR > 1 { print $1; }' logs.example.com/www.example.com_usage* > access.log

# Point goaccess at the access logs
goaccess access.log --log-format=CLOUDSTORAGE --config-file=/usr/local/etc/goaccess/goaccess.conf

Obviously pulling all of the logs down every time this script runs is not going to be sustainable forever, but that doesn't matter. goaccess supports a streaming mode, so it might be possible to jury rig a streaming process from GCS. For now, a more pressing concern is that it appears installing Python 3.8 has broken my gsutil install. The battle against software kipple continues.