countingup.com

Managing GitHub with Terraform

8 minute read

Rich Keenan

Terraform is a tool for managing infrastructure as code. It allows you to define your infrastructure in a declarative way, and then apply it to an environment. It's a great tool for managing infrastructure, but you might not realise that it's also a great tool for managing GitHub.

Terraform can manage GitHub?

Terraform is typically used to define cloud resources, at Countingup we use AWS so we use it to define our EC2 instances, RDS databases, and S3 buckets.

If you've only ever used Terraform for a single cloud provider you might not have realised that Terraform itself doesn't know anything about AWS* (or GCP, or Azure...), Terraform knows about resources, dependencies and state management. I really like the lifecycle diagram in Terraform In Action to understand what Terraform is actually doing.

* This isn't strictly true because it knows how to use S3 for storing/retrieving the terraform.tfstate file but it doesn't expose a generic S3 resource to user land.

To use Terraform to manage specific resources you need to install a provider for that resource, so for AWS you need to install the "AWS provider". This provider manages AWS resources through the AWS API using the Go SDK.

Generally speaking, if something can be managed through an API, it can be managed through Terraform.

GitHub has an API for managing its resources - repositories, users, teams, etc. - and that means it's possible to manage GitHub resources with Terraform too, as long as there's a provider (or we're willing to write one).

Terraform GitHub provider

Luckily, there's a community-supported GitHub provider as part of GitHub's Integrations organisation that does exactly what we need 🎉.

OK, but, like why? 🤨

There's a whole list of reasons why you might want to manage GitHub with Terraform and it's the same list of reasons why you might want to manage any other infrastructure with Terraform:

  • Decreased risk of errors - Manually clicking and typing in the UI is error-prone
  • Increased visibility - All changes are reviewed and tracked in version control
  • Increased consistency - All changes are applied in the same way across all repositories
  • Single source of truth - All changes are applied through Terraform, not through the UI

Countingup doesn't have a mono-repo. Each backend service has its own repo, and there's various frontend repos and tooling repositories. At the time of writing, we have about 100 all in. Ensuring consistency and correctness between those is hard to do without automation so a little over a year ago we decided to start using Terraform to manage our GitHub organisation.

Setup

I won't go into setting up Terraform and how to manage the state file, I'll jump straight into the GitHub specific bits.

Firstly you need to make a GitHub Personal Access Token with appropriate permissions then set this as an environment variable:

export GITHUB_TOKEN=<token>

Then add the provider configuration to your Terraform file.

provider "github" {
  owner = "Countingup"
  // You can set the token here instead but it will be publicly visible
}

Resources

Once the provider is set-up, resources are defined the same way any Terraform resource is defined, you declare a resource with a type prefixed with the provider name.

resource "github_repository" "my_repo" {
  name       = "my-repo"
  visibility = "private"
}

Note that all of the supported resource types and their attributes are documented in the GitHub provider documentation.

Here are a few of the resources we manage.

Repositories

We rely heavily on Terraform modules to manage our GitHub repositories as there are a lot of options and not a lot of differences between our repositories. Here's a slightly trimmed down version of what a github_repository looks like for us:

resource "github_repository" "repo" {
  name               = var.name
  visibility         = "private"
  topics             = concat(local.defaultTopics, var.topics)
  allow_merge_commit = false
  allow_rebase_merge = false
  allow_squash_merge = true
  auto_init          = true
  archive_on_destroy = true
}
  • name - Module variable as this obviously changes per repo
  • visibility - None of our public repositories are managed by Terraform
  • topics - We set some default topics based on module options but also allow arbitrary topics
  • allow_merge_commit et al. - Ensures consistent merge strategy between repos
  • auto_init - Creates a README.md on master with title and description
  • archive_on_destroy - A terraform destroy action will archive the repository rather than delete it. This is a very nice safety feature.

Branch Protection

We have branch protection rules enabled for all our repositories, this adds an extra layer of access control to our code and these rules are consistent across all of our backend service repositories.

The syntax is a little complex, there's probably a better way of writing this in Terraform but this works for us.

resource "github_branch_protection_v3" "protection" {
  # Support multiple branches
  for_each   = var.protected_branches
  repository = github_repository.repository.name
  branch     = each.key

  restrictions {
    # Specify teams with push-access, default to none
    teams = try(each.value.push_access_teams, [])
  }

  dynamic "required_pull_request_reviews" {
    # For each branch that requires pull requests...
    for_each = each.value.require_pull_requests ? [1] : []
    content {
      # ...require at least one review, etc
      required_approving_review_count = 1
      require_code_owner_reviews      = true
      dismiss_stale_reviews           = true
    }
  }
}

This resource is defined in a Terraform module with a default value for the protected_branches variable that most of our repositories don't override,

variable "protected_branches" {
  description = "Set protection rules"
  default = {
    master = {
      require_pull_requests  = true

      # No direct pushes to master
      push_access_teams      = []
    }
  }
}

Organisation and Team access

When a new developer starts at Countingup we add them to the developers team and any subteams that they need to be in. After applying the change GitHub sends an invite to the user (assuming there are enough seats in our organisation account, which we almost always forget to check and GitHub doesn't have a public API to manage so we can't include that in the Terraform, sadly)

Our teams module looks like this,

resource "github_team" "team" {
  name           = var.name
  description    = var.description
  privacy        = "closed"
  parent_team_id = var.parent_team_id
}

resource "github_team_membership" "team_members" {
  for_each = toset(var.members)
  team_id  = github_team.team.id
  username = each.value
  role     = "member"
}

resource "github_team_membership" "team_maintainers" {
  for_each = toset(var.maintainers)
  team_id  = github_team.team.id
  username = each.value
  role     = "maintainer"
}

This lets us easily specify who is in what team and what role they have within that team.

And an example of how we use it,

module "developers_team" {
  source      = "../modules/team"
  name        = "Developers"
  description = "All Countingup Developers"
  maintainers = [
    "username_1",
    "username_2",
  ]
  members = [
    "username_3",
    "username_4",
  ]
}

Downsides

It's been about a year since we started using Terraform to manage GitHub. It's mostly been an excellent decision, but there are some downsides.

  • It's slow - The GitHub API seems to be quite slow and the way some of the resources are managed by the provider isn't optimised. A terraform plan can take about 5 minutes to run which gets really annoying if you've made a mistake and need to re-run it. There's an open issue on GitHub that has some interesting suggestions for fixes that we should probably look into but we don't tend to make frequent changes so it's never been a high enough priority.
  • We can still make changes in the UI - There's a really strong culture around not making changes to our AWS infrastructure using the AWS console - even for our development environment. This is very much not the case for GitHub, so things can go out of sync. This tends to show up most often when a team member joins or leaves and we need to update the team membership quickly. There's some balance between pragmatism and consistency here that we could do better on.
  • We don't manage everything with Terraform - Not all repositories are managed by Terraform, sometimes it just doesn't make sense for prototypes, one-offs, etc. This has led to confusion where some people aren't sure which repos are managed and which aren't. We added the terraform-managed label to the default set to help solve this problem.

Future

Once you realise that "if something can be managed through an API, it can be managed through Terraform." you start to see possibilities everywhere.

We use Auth0 for our Company Formations product. This was set up through the Auth0 web UI but there's totally an official Terraform provider for this. We could (and probably should) be using this to manage our Auth0 configuration and I suspect we'll be taking a look at this soon.

We use Segment for moving data between our products and our analytics tools. This is manually configured through their web UI and it's getting a little unwieldy. Unfortunately, there isn't an officially supported Terraform provider and there doesn't seem to be a defacto community provider either. I think this is a great opportunity for Segment to invest in the community and help to develop a fully featured provider - I suspect we'll have developers eager to contribute.

Whilst I'm waiting for the slow terraform plan to finish maybe I'll use Nat Henderson's (in-)famous Dominos Terraform provider to order a margherita 🍕.