🔍

Building Pull Request-based ephemeral Preview environments on Kubernetes

~ Chinmay Naik

A CTO of a company calls you. They just migrated from Heroku to AWS on EKS. He's happy with the migration but wants you to build Heroku's "Ephemeral Preview Apps" on Kubernetes.

You know you can use ArgoCD for this, but you're in for some surprises and complications!

He wants to build ephemeral preview apps for both frontend and backend repos.

Frontend repos are simple Single Page Apps using Vue.js
Backend repos are Python+Django and use PostgreSQL, Redis, and MongoDB.

He lists down some more asks, which complicate things a bit.

He wants you to handle:

dependency management for services
database migrations & seed data for backend services
automated deletion of envs to save costs
integration with Jira and GitHub Deployments
and much more

All this while using existing tools as much as possible.

Your Documentation-driven approach

You sign up for this work and start creating a doc listing all requirements and identifying the unknowns. You've built preview environments before. However, handling dependency management, database migrations, seed data, etc., often requires custom solutions as it is contextual. Couple with that, some constraints to use existing tools, and now you have some interesting engineering work!

You list down existing tools and processes. They are:

Kustomize
GitHub Actions for CI
ArgoCD
Versioned DB scripts
AWS Secrets Manager for secrets, etc.

The next step is to try out some POCs to convert the "known unknowns" into "knowns". You know that the preview envs can easily be created for the frontend repos. For backend apps, you'll need to find out answers to some questions.

Do we create PostgreSQL, Redis, MongoDB for each PR, or can these be shared?
Do we need the ability to point a preview service to another preview service? Or does it always point to staging env?

You'll need to design the system based on answers to these and other questions.

So you do the grunt work, write down all questions, discuss the trade-offs with the CTO and other engineering leads, and finally, you come up with a solution that handles all these cases. Getting to this solution requires some POCs, trial and error, but it's part of the process.

Ephemeral PR based Preview environment workflow

Here's how you design the workflow.

A Developer creates a "Preview" labeled PR
Start CI workflow
ArgoCD watch the PR
ArgoCD creates application deployment in K8s
Preview env public endpoint is made available to devs and QA
ArgoCD deletes the resources when PR is merged

This flow works well for both frontend and backend repos.

Challenges

Here are three main challenges you handle along the way:

Seed data management
Dependency management for services
Keeping costs low for the Preview environments

Let's expand on the challenges further.

Seed data management

You create a custom PostgreSQL image already loaded with seed data. This seed data is version-controlled in Git. That way, devs can easily update the PostgreSQL image when some new data needs to be loaded.

Dependency management for services

You run the database containers in the same preview namespace for each PR. Thus, they are isolated from other PRs. By default, service A's PR will point to service B's staging env (if service A depends on service B) but can be easily overridden by devs by a config change.

Keeping costs low for the Preview environments

To keep Preview env costs in check, you suggest running it on Spot instances. Obviously, you're also deleting all resources of the preview environment if it's not actively being used. This concludes your work. The CTO is super happy and wants to work with you further.

Are you such a CTO or engineering leader looking to supercharge developer productivity?

If you're looking for a reliable engineering partner for all things Infra, DevOps, Observability, and Reliability, reach out to me on LinkedIn or Twitter.

We do Pragmatic Software Engineering - on Production. That's it!

I write such stories on software engineering. There's no specific frequency as I don't make up these. If you liked this one, you might love - 💰 Taming GCP networking cloud costs

Follow me on LinkedIn and Twitter for more such stuff.