Deploy to Google App Engine using GitHub Actions (CI/CD)

Continuous Integration and Continuous Deployment (CI/CD) is the core thing in MLOps (other being Data Version Control) and we will look at how to use Google App Engine and GitHub Actions to achieve that.

Avinash Kanumuru
Geek Culture

--

Introduction

As part of a personal project, I was building a dashboard using Data Studio with data from Big Query that is being populated by python code hosted in App Engine. Some of the raw data files, config and ML models are saved in Cloud Storage using Data Version Control using dvc.

Below is my post to understand Data Version Control in detail using dvc — on how to set up and use it with Google Cloud Storage. A detailed guide with a practical example.

Coming back to my python code — it is extracting data from various APIs from the internet, processes and analyze data using config and models stored in Cloud Storage, and then store final data in Big Query datasets and Cloud Storage files. And the code runs every day based on an AppEngine cron job.

This constitutes my app/project that runs seamlessly using various GCP components (by the way, I’m running this on GCP for almost a month at no cost, as everything is under free limits — not the free trial). But the only missing piece in this whole setup was — when I push the code to GitHub it has to push the new version of code to App Engine and this will complete the CI/CD stage of DevOps (in our case MLOps).

CI/CD with GitHub Actions

I’ve 2 branches in my git repository — main and develop — I use develop to build or tune the model and then push the final model to Cloud Storage using dvc. So that I’ve previous versions of models as well — if I’ve to retrieve in future to compare results.

Once I’m satisfied with the model results, I’ll merge develop to main branch (we can do a pull request also, but I’m the sole developer on this project). And then separately push the code to GAE using Google Cloud SDK from local computer.

So, idea is to automate this deployment to GAE on committing changes to main branch using any CI/CD tool. I chose GitHub Actions for simplicity, but one can use GitLab / Jenkins / … and this article will explain CI/CD using GitHub Actions.

Below is a gist of my app.yaml and cron.yaml for deploying it to Google App Engine (GAE) that I use to deploy from local computer and we will be using it in GitHub Actions.

Now create a file in .github/workflows/main.yml to configure out GitHub Actions to deploy our code to GAE on push to main branch. Above is a gist of the configuration. It is kind of self-explanatory, but it took me some time to set this up as there is no clear documentation anywhere on the internet and I’ve to mix different pieces to get this configuration with numerous failed GitHub workflows.

There is a name to this Action, and then what triggers this action — on push /pull request to the main branch. And then jobs that need to run on the trigger — only deploy, we could have more jobs as well. Then it runs on the latest ubuntu with the following steps —

  1. checkout the latest codebase using actions/checkout@v2
  2. deploy the code to app engine using google-github-actions/deploy-appengine@v0.2.0
  3. test if app has been deployed successfully by running a curl command

In step 2, there are some additional parameters that are needed for GitHub actions to deploy correctly.

  1. deliverables — what needs to be used to deploy. In most cases, it would be app.yaml, but in my case, I’m also scheduling a batch job as I’m using app.yaml cron.yaml
  2. version — (optional) this is the version to be used in App Engine. If not used App Engine will create a version with the current timestamp.
  3. project_id — this is required so that GitHub actions will deploy to correct project
  4. credentials — json content from the service account API key file we have downloaded from GCP.

For added security, we can add project_id and credentials as GitHub secrets and use them here. To add a GitHub secret, go to Setting of your project repo > Secrets > Add Secret. Copy the JSON file contents of the service account and paste them as credentials.

The above config is similar to the gcloud command we use to deploy from the local computer, except credentials are stored and added to PATH as GOOGLE_APPLICATION_CREDENTIALS.

gcloud app deploy app.yaml cron.yaml --version=v1 --project=<project-id>

Below is the successful completion of GitHub actions that deployed codebase to GAE.

Conclusion

CI/CD pipeline is an essential part of DevOps (and is the same with MLOps) to streamline and automate the deployment process. We have used GitHub Actions to deploy our project to Google App Engine automatically on pushing code to a specific branch. With this CI/CD pipeline, we can focus more on the core functionality of the project — software development or model building.

--

--

Avinash Kanumuru
Geek Culture

Leading Data Science & Engineering in FinTech space