Backup your AWS CodeCommit Repository to Amazon S3 Using AWS Lambda Part 1

In order to create a backup of your AWS CodeCommit repository, you may be looking for a flexible way to copy your files to S3. Using AWS Lambda and python it is possible to take such an approach.

In this article, we will discuss how to setup pushing AWS CodeCommit repository files to S3. We will adopt a two lambda function system to eventually zip those S3 uploaded files and get a zip archive url to use in our other projects.

Table of Contents:

  1. Create S3 bucket
  2. Create CodeCommit repo
  3. Connect & push code to repo
  4. Setup Lambda function
  5. Upload python code
  6. Test the function

1. Create S3 bucket

First, let’s make the S3 bucket to hold the contents of the CodeCommit repository that we will push. Go to the AWS console and the AWS S3 service and click on “Create Bucket”. Enter the S3 bucket name, bucket region, and, if you want, the bucket settings from another bucket.

On the next screens, you set the options and the permissions for the bucket. The key here is to allow read access for your bucket using a bucket policy such as:

{
    "Version": "2012-10-17",
    "Id": "SOMEPOLICYIDHERE",
    "Statement": [
        {
            "Sid": "SOMESIDHERE",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::REPONAMEHERE/*"
        }
    ]
}

You can place this code in the bucket policy tab and replace the ID, SID, and Repo name. This policy will allow AWS Lambda to get and put objects into the S3 bucket.

2. Create CodeCommit Repository

Next we will create the CodeCommit repository if you don’t already have one to use. If you have a CodeCommit repository already, skip this section to start making the Lambda function.

Go to the AWS CodeCommit service in the appropriate region that you will be building this solution and click “Create Repository”. Enter the repository name, optional description, and any tags for the repository.

You will need at least one commit into the appropriate branch of this repository to register the CodeCommit branch as the trigger for the Lambda function.

Before continuing, make sure you have read about how to connect to a CodeCommit repository in this article. This will be necessary to make the branch appear when setting up the trigger of the Lambda function.

4. Setup Lambda function

We will need a special AWS IAM role to allow reading and writing between CodeCommit and S3 using our function. To do this, we need to go to AWS IAM and Roles on the left-hand side. On this screen, click on “Create role” to start making a custom role for the Lambda function.

On the permissions page, select two policies: AWSCodeCommitReadOnly and AWSLambdaExecute to attach to the custom role.

Next let’s setup a Lambda function by going to AWS Lambda and clicking on “Create function”. On the next screen, input your function name and your runtime language (for this article we will be using python 3.7).

Afterwards, setup the trigger for the Lambda function by clicking “Add Trigger”. Select CodeCommit as the trigger type, the repository, a trigger name, when the trigger is activated (Events), branch name, and any custom data, and then click “Add”.

5. Upload python code

Now we are ready to upload python code to connect and upload changes to S3 as a list of files.

import boto3
import os
import mimetypes
from zipfile import ZipFile

# Get list of CodeCommit repo items
def get_blob_list(codecommit, repository, branch):
    response = codecommit.get_differences(
        repositoryName=repository,
        afterCommitSpecifier=branch,
    )
    blob_list = [difference['afterBlob'] for difference in response['differences']]
    while 'nextToken' in response:
        response = codecommit.get_differences(
            repositoryName=repository,
            afterCommitSpecifier=branch,
            nextToken=response['nextToken']
        )
        blob_list += [difference['afterBlob'] for difference in response['differences']]
    return blob_list
def lambda_handler(event, context):
    # Get the S3 bucket destination
    bucket = boto3.resource('s3').Bucket(os.environ['bucketName'])
    # Get the CodeCommit source
    codecommit = boto3.client('codecommit', region_name=os.environ['region'])
    repository_name = os.environ['repository']
    # Read each file with a loop and save it to the S3 bucket
    for blob in get_blob_list(codecommit, repository_name, os.environ['branch']):
        path = blob['path']
        content = (codecommit.get_blob(repositoryName=repository_name, blobId=blob['blobId']))['content']
    content_type = mimetypes.guess_type(path)[0]
    if content_type is not None:
        bucket.put_object(Body=(content), Key=path, ContentType=content_type)
    else:
        bucket.put_object(Body=(content), Key=path)

This code uses get_blob_list to first get a list of CodeCommit repository objects. In addition, it uses get_differences to check what objects have been changed to only upload modified objects. Next, it gets the S3 bucket and the CodeCommit Repository. Finally, it puts each object into the S3 bucket with the correct mime-type.

Make sure you save the function after copying this code into the function. Setup a test, by clicking the “Test” button in the top-right. On this screen, replace the JSON with two empty brackets as we don’t need to pass any arguments to the function.

Save this test and then hit the Test button again to test out the function. If all goes well, then you should be able to see your CodeCommit files copied over to the S3 bucket that you setup. That’s it for Part 1 of our tutorial!

In Part 2, we will be discussing how to modify the Lambda python code to zip the files before pushing to S3 so that you can store a zip archive of your CodeCommit repository.

Leave a Reply

Your email address will not be published. Required fields are marked *