Coupling Terraform Workspaces with Git Branches
-
How do I make it so that my terraform code can deploy to multiple environments without copy-pasting or using a complex folder structure like TerraGrunt and how can we address the small differences between each environment?
A little while back i had some different ideas about how to do this in a super generic way with the additional requirement of only having 1 set of AWS keys and using gitlab ci/ci and making the ci-cd yaml as simple as possible.
Okay thats a lot but the solution is surprisingly simple.
The idea here is to link git branches to an AWS account and split up the state files with TF workspaces. Sounds good but how can handle authentication?
variable "workspace_iam_roles" { default = { default = "arn:aws:iam::xxxxxxxxxxx:role/terraform-sa-role" } } provider "aws" { region = var.region assume_role { role_arn = var.workspace_iam_roles[terraform.workspace] } }
Here in the providers.tf it defines the workspace relationship to an account. In this we only have one: "default" which is linked to account xxxxxxxxxxx by an assumed role. This role was created earlier in the account with the name terraform-sa-role. We use the built in
terraform.workspace
variable to pick the role we need for that environment.Adding environments looks like this:
variable "workspace_iam_roles" { default = { default = "arn:aws:iam::xxxxxxxxxxx:role/terraform-sa-role", dev = "arn:aws:iam::yyyyyyyyyyy:role/terraform-sa-role", preprod = "arn:aws:iam::zzzzzzzzzzz:role/terraform-sa-role", staging = "arn:aws:iam::zzzzzzzzzzz:role/terraform-sa-role" } }
By switching our terraform workspace we are now telling terraform to use a different role. In the case of preprod and staging they are on the same account.
But switching between workspaces is kind of a pain, so why not couple them to the git branch? That way when we are on the preprod branch, it will also use the preprod workspace, and by extension the preprod assumed role.
This can be achieved with this yaml script in gitlab:
before_script: - | terraform init if [ "${CI_COMMIT_REF_SLUG}" = "master" ]; then CI_COMMIT_REF_SLUG="default" fi terraform workspace new ${CI_COMMIT_REF_SLUG} || true terraform workspace select ${CI_COMMIT_REF_SLUG}
In the gitlab CI it defines the relationship between branch and workspace, with a force mapping of
master
->default
asdefault
is the default workspace andmaster
is the default git branch. Or you can set this to whatever you want, like if you usemain
instead ofmaster
Then in the gitlab interface I define a set of AWS keys. This key gives access to a role that is permitted to assume the other roles defined in providers.tf that i mentioned earlier and also it is permitted to make changes to the state files in s3. This is critical because the
backends
part of terraform does not support string interpolation. So the default keys will be where the statefile is stored.So this key sorts out the init phase (creating the statefile) but doesnt actually do the changes, instead it assumes a role, as defined in providers.tf and this role actually does the changes in the desired account.
Its also not perfect as when it comes to feature branches it tends to fail on the plan step as the feature branch is rarely defined in the providers.tf (something I'd like to sort some other time, maybe some fuzzy searching? But then I want to keep everything as explicit and as declarative as possible).
What about the differences between environments?
We can use .tfvars files to handle that for us. With a project structure like:
vars/ default.tfvars preprod.tfvars production.tfvars
I use the branch name as the file name here so we can again make the mapping pretty easily within
.gitlab-ci.yml
tf-plan: stage: tf-plan script: - terraform plan -var-file="vars/${CI_COMMIT_REF_SLUG}.tfvars" -out plan.tfplan artifacts: paths: - plan.tfplan
Since I defined the
before_script
to run before all jobs, this again handles the rewrite ofmaster
=>default
. So when using the master branch it will get the default.tfvars and use that. This file can contain key changes between environments, for example when the production environment requires a set of 40 hosts in an auto-scaling group but in preprod we only need 1, a variable can be defined likedesired_count
in thevariables.tf
file and then set to 40 inproduction.tfvars
and 1 inpreprod.tfvars
The upside is that you only need one set of keys and it gives you access to really use git branching and merging. So in the ideal scenario you can make your changes in the staging branch -> deploy to staging -> check your changes in the staging account -> merge to master -> deploy to master -> check changes in the main account.
But what if I want to do all this locally?
I created a simple terraform wrapper in python called Atmos. This supports a variety of authentication methods, originally created to do some rewriting of the
~/.aws/credentials
file on the fly but also supports the above method without any additional flags.