How We Built This Blog: Jekyll, S3, Cloudfront with SSL/TLS, Route53, and a whole bunch of Ansible

Every good tech blog has a 'how to make a blog' post. Some even stop there!

This one will show you how we created this blog as a static site with easy editing; cheap, scalable hosting; and SSL/TLS enabled. I'll walk you through our little pipeline, and then show you the Ansible code we use to deploy it.

The Pipeline

Here are the main components that our site uses:

1. Jekyll

Jekyll is an awesome static site generator that allows you to write your pages in Markdown. It's probably the most popular static site generator, with a massive usage on Github Pages. Markdown is a good choice for us because it's developer friendly and this is our tech blog for developers - it's not the best for non-technical people!

Because Jekyll outputs plain static files, we only need to put these online somewhere for the site to be live - no databases, backends, or other code to keep running. This is nice and simple, and allows us to focus on making YPlan.

2. S3

S3 Logo

S3 is Amazon's Simple Storage Service (three S's = S3). It lets you store files for cheap, and can do some basic web serving.

It's plausible to serve your site directly out of S3, however we can't do that with a .yplanapp.com subdomain since we have set up HSTS with includeSubdomains on, meaning everything on our domain must be served with HTTPS. Also S3 alone isn't the fastest for serving web content.

3. Cloudfront

Cloudfront Logo

Cloudfront is Amazon's CDN that distributes content quickly with caching, through a network of many servers worldwide. It can read directly from an S3 bucket, and since you only pay for usage it's also quite cheap to use for a simple static site like this.

We have it set up in front of the S3 bucket and give it a copy of our SSL certificate + private key so that it can serve tech.yplanapp.com properly. With custom SSL like this we can only support clients with Server Name Indication (SNI), however in practice this basically means not working with IE6, which we're fine with.

4. Route53

Route53 Logo

Route53 is Amazon's DNS solution that we use for all of our DNS. For this blog we just need a single entry that points from tech.yplanapp.com to the Cloudfront distribution. Since the two are integrated, Route53 handles resolving the subdomain to the nearest Cloudfront server, which makes sure the site is served quickly wherever you are in the world.

Gluing it all together with Ansible

Ansible Logo

Ansible is our configuration management language of choice at YPlan. We use it everywhere to automate all the things, from server provisioning to deployment, to S3 configuration. Naturally, as much as possible of this blog is automated; here's a quick walkthrough.

We have a playbook called deploy_techblog.yml that starts like so:

- name: Deploying Techblog
  hosts: 127.0.0.1
  connection: local
  vars:
    cloudfront_distribution_id: SOMETHING  # From Cloudfront console
    cloudfront_domain_name: SOMETHING.cloudfront.net  # From Cloudfront console
    s3_bucket: SOMETHING  # Will be automatically created
    techblog_root: ../../techblog  # Path to Jekyll directory
  tasks:

Since this playbook is used only to manage external resources (S3, Cloudfront, ...) we only need to connect to localhost using the hosts and connection clauses. We also declare a few variables we use to reduce repetition in the tasks that follow - if you're following along at home, you'll want to replace those.

Everything that follows is an entry in tasks. The first thing to do is to build the site with Jekyll:

- name: build blog
  command: jekyll build
  args:
    chdir: '{{ techblog_root }}'

In one command Jekyll wipes the _site directory, and rebuilds the whole site inside. It makes sure no stale files are left around, e.g. if a page gets renamed.

We then configure S3:

- name: s3 bucket
  s3:
    mode: create
    bucket: '{{ s3_bucket }}'
    region: '{{ aws_region }}'

- name: s3 bucket is public
  command: >
    aws s3api put-bucket-policy
      --bucket {{ s3_bucket }}
      --policy '{
          "Version":"2008-10-17",
          "Statement": [
              {
                  "Effect":"Allow",
                  "Principal": {"AWS": "*"},
                  "Action":["s3:GetObject"],
                  "Resource":["arn:aws:s3:::{{ s3_bucket }}/*"]
              }
          ]
      }'

- name: s3 bucket has versioning on
  command: >
    aws s3api put-bucket-versioning
      --bucket {{ s3_bucket }}
      --versioning-configuration Status=Enabled

- name: s3 bucket expires old versions
  command: >
    aws s3api put-bucket-lifecycle-configuration
      --bucket {{ s3_bucket }}
      --lifecycle-configuration '{
          "Rules": [
            {
              "ID": "Old versions disappear",
              "Prefix": "",
              "Status": "Enabled",
              "NoncurrentVersionTransitions": [
                {
                  "NoncurrentDays": 30,
                  "StorageClass": "STANDARD_IA"
                }
              ],
              "NoncurrentVersionExpiration": {
                "NoncurrentDays": 90
              }
            }
          ]
      }'

- name: s3 bucket website on
  command: >
    aws s3api put-bucket-website
      --bucket {{ s3_bucket }}
      --website-configuration '{
          "IndexDocument": {
              "Suffix": "index.html"
          },
          "ErrorDocument": {
              "Key": "404.html"
          }
      }'

This runs through several steps. The first uses Ansible's s3 module to create the bucket if it doesn't exist, then the following tasks use the AWS CLI to add activate S3 features on the bucket. Many plain commands don't work well with Ansible since they aren't idempotent, and re-running them either fails or does something bad; fortunately most AWS CLI commands are idempotent, so we don't need to find/code extra Ansible modules when the CLI can do it already :)

The S3 features we turn on are:

  • Bucket Policy - we need the files to be public so Cloudfront can fetch them and serve them to the web.
  • Versioning - this is useful for backup purposes. Old versions of files, even if they were deleted, will be retrievable on S3.
  • Lifecycle Configuration - we don't need to keep old versions forever; we get S3 to transition them to 'infrequent access storage' after 30 days, which means they cost less to store (but more to access), and then delete them fully after 90 days.
  • Website Configuration - we need S3 to serve this as a website, which means we have to tell it the names of the index document (what you see at /), and the error document (which you see for missing content).

Having configured the S3 bucket we then go on to upload the site content into it using another AWS CLI command:

- name: upload blog
  command: aws s3 sync --delete . s3://{{ s3_bucket}}
  args:
    chdir: '{{ techblog_root }}/_site'

The sync command uploads our _site directory into the bucket, avoiding re-uploading files that haven't changed. We add the --delete flag in order to remove anything that has been deleted.

Now Cloudfront just needs to serve our content. Unfortunately we aren't automating the Cloudfront configuration with Ansible, since it's incredibly big and slow to update. We just clicked around on the console to create it; our setup is basically a combination of Amazon's two guides on Cloudfront + S3 and Cloudfront + Custom SSL - if you're following along you'll just need to do these too.

After the S3 sync, we run a Cloudfront invalidation on the whole site:

- name: cloudfront invalidate
  command: >
    aws cloudfront create-invalidation
      --distribution-id {{ cloudfront_distribution_id }}
      --invalidation-batch '{
          "Paths": {
              "Quantity": 1,
              "Items": ["/*"]
          },
          "CallerReference": "techblog-deploy-{{ ansible_date_time.epoch }}"
      }'

This wipes any old cached copies from Cloudfront's worldwide cache network - if we didn't do this, it would keep serving the old version website for a long time, because we use caching headers. The other option would be to not use a short TTL on objects in S3, forcing Cloudfront to fetch from S3 every time; this might work better for bigger sites, however since we don't update very frequently we're okay with this.

The final thing we've automated is the DNS entry:

- name: DNS
  route53:
    command: create
    zone: yplanapp.com
    record: tech.yplanapp.com
    type: A
    ttl: 300
    value: '{{ cloudfront_domain_name }}'
    alias: yes
    alias_hosted_zone_id: SOMETHING
    overwrite: yes

We're using an alias entry which does the automatic integration with Cloudfront I talked about earlier for serving the correct IP depending upon your location.

Conclusion

Hope you've found this a useful guide on setting up static websites with AWS. If you have any hints on how to improve the setup, let us know!