My Site Project: Part 3 - Automating the Site's Deployment to S3 Bucket
In this part of the series, I’ll automate the site deploy process as described by Mike Tabor.
In his tutorial, Mike explains how to deploy a static site to an S3 bucket on AWS and then serve it with Cloudflare.
Cloudflare is a very versatile service that has many perks included in its free plan, such as being a reverse proxy and a reverse DNS proxy. Moreover, we don’t have to generate an SSL certificate, since it can be provided by Cloudflare.
To recap, we automated the site build process in the previous post and now we have static content generated by Jekyll. In this post, we’ll upload the content to AWS automatically with a Docker container.
Disclaimer:
I didn’t affiliate with either AWS or Cloudflare while writing this post. Use the services at your discretion, as they may incur costs. Familiarize yourself with the plans & offers for all the services listed above.
Prerequisites
- You own a domain and can manage its DNS entries
- You have an AWS account set up
- You installed the AWS CLI tool on your computer
- You have a Cloudflare account set up
- Some knowledge of Python
- We’ll use the boto3 package to deploy the site
Set Up
Follow the steps provided by Michael in his post. A small remark: you can give up configuring the bucket policy, as your bucket will be publicly accessible.
Prior to accessing AWS CLI, AWS recommend you configure a special IAM account dedicated to programmatic access. In our case, this account will be dedicated for the deployment automation. Give it as less permissions as possible (for example, only allowing list/read/update/write/delete operations at the S3 bucket level).
After creating the account and assigning access to it, generate the access keys and add them into the CLI on your computer by issuing the aws configure
command. Once you enter the command for the first time, it’ll ask for the credentials you just generated.
The credentials you entered on your machine will be later passed to the deployment container.
The CI Steps
We’ll automate some of the steps of CI/CD (continuous integration/continuous deployment) process:
- Building the site
- Automatically uploading the generated content to S3 bucket
The CI/CD process includes some additional steps that we’ll review further in the series (stay tuned).
Building the Site’s Contents
Let’s recall the previous post, where we ran Jekyll commands on top of a container:
docker run --name siteBuilder --rm -v $PWD:/site sitebuilder jekyll build
Running this command takes the Markdown files and other resources, and generates a site which is ready for deployment under the _site
subdirectory.
The Deployment Process
I’d like to run the deployment process on top of a Docker container. Hence, we should prepare a custom image designated for deployment.
We’ll create a special directory called cicd
under the site’s directory: mkdir cicd
While performing all the steps below, we’ll remain in the cicd
directory.
The Dockerfile
Let’s begin with the Dockerfile itself, so we gain better insights in order to write our deployment script.
As for now let’s create an empty file named deployment.py
. We’ll get back to it in the next step.
Let’s also create the requirements.txt
file with the following contents:
boto3==1.20.15
Finally, let’s add the Dockerfile.deploy
Dockerfile and populate its contents as specified below:
FROM python:3.9.9-alpine3.14
ENV S3_BUCKET_NAME=www.yourdomain.com
ENV SITE_CONTENTS_PATH=/site
ENV AWS_SHARED_CREDENTIALS_FILE=/deploy/credentials
RUN apk update && apk upgrade
WORKDIR /deploy
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY deployment.py ./
RUN chmod 755 deployment.py
ENTRYPOINT ["./deployment.py"]
Let’s dissect the Dockerfile now.
The Environment Variables
It’s a good practice to avoid hard coding. Hence, I defined three environment variables, from which the relevant data will be read.
Note that boto3 searches for the AWS credentials in the host OS or a container that it runs in.
One of the methods is to check for the file path using the AWS_SHARED_CREDENTIALS_FILE
environment variable that points to the credentials file.
Other two ENV
statements state to which S3 bucket the files would be uploaded to and the path from which these files should be taken from respectively.
Replace the bucket name (S3_BUCKET_NAME
) with the actual bucket name.
Another advantage of using the environment variables is that their values can be overriden during container runtime.
As for the rest of the Dockerfile, we employ the same methodology of containerizing an app as we employed in the 1st post of the series.
Namely, we chose the most lighweight base image, and on top of it we installed the needed dependencies.
Finally, we passed the code to it and activate the main program at the container’s initialization by stipulating the container’s ENTRYPOINT
.
The Deployment Script
The Dependencies
If we revisit the requirements.txt
file, we see that currently there’s only one dependency and it’s the boto3
package.
Show Me the Code
#!/usr/local/bin/python
from pathlib import Path
import os
import boto3
import sys
from botocore.exceptions import ClientError
import mimetypes
ENV_S3_BUCKET_NAME = 'S3_BUCKET_NAME'
ENV_SITE_CONTENTS_PATH = 'SITE_CONTENTS_PATH'
site_contents_file_path = Path(os.environ.get(ENV_SITE_CONTENTS_PATH))
def get_mimetype(object_path):
content_type, encoding = mimetypes.guess_type(object_path)
if content_type is None:
return 'binary/octet-stream'
return content_type
class BucketManager:
def __init__(self, bucket_name):
self._s3_client = boto3.client('s3')
self._bucket_name = bucket_name
def upload_files(self, object_path):
# If the object is taken from the parent directory, then the key is the file name
bucket_key = object_path.name
# If the object originates from one of the subdirectories, its key would be the relative path
# AWS S3 uses / to derive paths and create folders in the bucket itself
if len(object_path.parts) > 3:
bucket_key = "/".join(object_path.parts[2:])
try:
if object_path.is_file():
sys.stdout.write(f'Uploading {bucket_key}\n')
content_type = get_mimetype(object_path)
response = self._s3_client.upload_file(str(object_path), self._bucket_name, bucket_key,
ExtraArgs={'ContentType':content_type})
if object_path.is_dir():
for os_obj in object_path.iterdir():
self.upload_files(os_obj)
except ClientError as e:
sys.stderr.write(str(e))
sys.stderr.flush()
exit(1) #return exit code 1 to the container
return False
return True
def upload_site_contents():
s3_bucket_name = os.environ.get(ENV_S3_BUCKET_NAME)
s3_manager = BucketManager(s3_bucket_name)
s3_manager.upload_files(site_contents_file_path)
sys.stdout.flush()
upload_site_contents()
The above script retrieves the environment variables from the container’s OS and uploads the files the the S3 bucket accordingly.
I used the BucketManager
class in order to reuse an existing session, so we make less connections to AWS in the process.
The script is written based on the examples provided by AWS.
A Note on Uploading the Files Programmatically
Based on the Content-Type
header in the response, our browsers know how to deal with different types of data (for example, the text/html
value tells our browser to reander an HTML page for us, whereas text/css
prompts our browser to style the page). Failing to specify the proper Content-Type
header’s value, will result in unexpected browser’s behavior while rendering your site (e.g. not stlyzing the pages at all).
In most of the cases, the servers can automatically detect the MIME type of the file being served to the user.
Unfortunately, as per this Stack Overflow response, the S3 service doesn’t add this metadata automatically if the files are uploaded by the boto3 client. Hence, you have to stipulate this piece of data in your script.
As a result, I also implemented the get_mimetype
function to get the file’s MIME type from the mimetypes
built-in module in Python. Afterwards, I pass this function’s output to the S3 client by adding the ExtraArgs
argument to the upload_file
function.
To finish this step, let’s build the image by running the following command:
docker build -t sitedeployment -f Dockerfile.deploy .
Run the Deployment Image
Let’s move back up to the site’s directory now and run the deployment container:
docker run \
-it --name deploy -v $PWD/_site:/site \
-v ~/.aws/credentials:/deploy/credentials:ro --rm \
sitedeployment
Note that we’re mounting the AWS credentials file stored locally on our computer to the container. In the Dockerfile we stipulate a custom location for the credentials, so boto3 uses that custom location to find the credentials.
Bringing Everyting Together
As we discussed previously, running the container from the command line can be cumbersome.
Let’s relay all the settings to the docker-compose
file we created in the previous part:
version: "3"
services:
site:
image: nginx:1.21.4-alpine
volumes:
- $PWD/_site:/usr/share/nginx/html
- ./nginx.conf:/etc/nginx/nginx.conf:ro
ports:
- 80:80
container_name: site
sitebuilder:
image: sitebuilder
volumes:
- $PWD:/site
container_name: sitebuilder
entrypoint: ["jekyll", "build"]
deployment:
image: sitedeployment
volumes:
- $PWD/_site:/site
deployment:
image: sitedeployment
volumes:
- $PWD/_site:/site
- ~/.aws/credentials:/deploy/credentials:ro
As for the sitebuilder
service, note that the entrypoint
statement was added. This way, we’ll make Jekyll the main container’s process. So, when running, Jekyll will return its exit code (status) directly to the container.
The exit code will be used in further posts of the series to automatically determine whether the build was successful.
To build the site, you can run the following command:
docker-compose up sitebuilder
This will run the container with all the mounted volumes and build the site.
To test the site, run the following command:
docker-compose up site
And finally, to deploy the site, issue this command:
docker-compose up deployment
Don’t forget to remove the containers with the docker-compose down
command.
The source code is available here
Stay tuned for more posts…