Part of having mature infrastructure management is treating infrastructure as code. If your infrastructure is on a public cloud, Terraform is a good choice for that “code.” It’s platform-agnostic (at least as agnostic as you’re going to get), easy to learn, and easy to run. If you’re coming from something platform-specific, like AWS’s CloudFormation, you’ll probably feel right at home with Terraform in short order.
Terraform itself is pretty straightforward. The syntax for the main file is similar enough to JSON that it’s immediately readable and the muscle memory-induced errors stop after a few hours. Terraform’s documentation is pretty thorough, easy to browse, and it’s easy to find the exact page you want with a decently-formed Google search. Terraform’s examples aren’t quite as complete as CloudFormation’s, but the rest of the documentation is thorough and clear enough that you don’t really miss it.
Since Terraform tracks the state of your cloud infrastructure and stores it for future reference, you’re likely going to want some form the of remote state management. Terraform offers a service that includes this, or you can configure your build and deployment pipeline to include reading and writing state data to somewhere persistent and universally accessible. If your project consists of just you and some free tier infrastructure, Terraform’s default of “write it here” works well enough. In fact, you’re going to want to set up some sort of remote state management if you have anybody else working with this environment, so I recommend getting your environment state off your machine as soon as possible. If you don’t use some sort of network-based state file, you’ll want to make sure you run
terraform refresh every time you try to do something with Terraform.
1 of Terraform’s most useful features is the ability to import existing cloud resources into Terraform through the handy-dandy
terraform import command so that it can manage them going forward. You still need to define the resources in your Terraform files, but if you want to manage something with Terraform, you were going to have to do that anyways. It’s handiest when someone manually adds something to an existing environment that you want to manage in Terraform going forward (it’s not that importing full-fledged environments is hard per se, it’s just that you’ll have to import it resource by resource). This means you don’t have to re-create any existing environment you worked hard to set up, just import it and you can manage and automate changes going forward.
Using Terraform with AWS
Now, Terraform claims to be “cloud agnostic,” and it is up to a point. You can run
terraform apply on any .tf file and that same command will make sure any cloud environment matches the state described in that file, but at some point there has to be cloud-specific code configurations, and that point is the .tf file itself. It’s effectively the Strategy Pattern of cloud infrastructure management. Terraform provides the wrapper, the Terraform file itself provides the specific implementation details. Since almost all of my cloud experience is with AWS, that’s where my Terraform experience comes in too. If you’re using a different cloud, your mileage will vary only in the specifics of your main Terraform file – that’s the beauty of Terraform.
The biggest thing I noticed was that there was a lot of stuff that got taken care of behind the scenes when I used the AWS console that I had to make sure to do myself in Terraform – almost all of which was IAM-based. There’s a lot of little things that have to happen within IAM to get different AWS services talking to one another. If you go through AWS directly, it does a lot of that internal plumbing for you. If you use something like Terraform, you’re going to have to explicitly set a lot of permissions yourself. This isn’t a big enough deal to warrant not using Terraform (unless you’re committed to being all-in on AWS, then you might as well use CloudFormation since it’s also pretty good at handling this stuff), but keep in mind that you’re going to be going back and adding permission entries to have service A invoke service B.
All the extra plumbing you have to put into Terraform around permissions is really a function of just how granular you want your permission structure to be. IAM makes it fairly easy to limit permissions for various services, but that means defining more roles and policies in your Terraform file, which you then have to keep up with so you can link the right services to the right roles. On the other hand, creating 1 role with permissions to do everything that any piece of your AWS environment would need is much easier, but defeats the purpose have using these roles for security. Personally, I still recommend dealing with all the roles, policies, and permissions, even if it does end up being half or more of your Terraform file. All Terraform is doing here is exposing how customizable and modular building out your AWS infrastructure can be. Given how relatively easy (albeit verbose) that customization makes properly locking down your services from 1 another, I say the benefits outweigh the hassles.
Tips and tricks I’ve adopted
First and foremost, as with any large-scale project, don’t try to do it all (or even large chunks of it) at once. Add each piece as you’re going to need it – it’ll make debugging easier as well as keeping track of each individual resource easier. For example, if you’re building an API Gateway and Lambda-based API, create the resource paths you need, then the HTTP methods they’ll listen for, then worry about integrating it with Lambda. It’s what you’d likely do if you were building each piece out by hand in the AWS console, and there’s no reason to not do the same thing in Terraform.
Once you start developing your Terraform file, make all your infrastructure changes via Terraform. The whole point of using something like Terraform is that it’s repeatable, so if it works when you run it, then it should work when someone else runs it in a different environment (assuming environments are set up consistently). Think of it as testing your Terraform file in “production” – just without actually running the risk of crashing production. Also, by using Terraform to manage your infrastructure exclusively, that infrastructure becomes easier to change in response to new additions to your environment, or code changes that need to be pushed, just tweak the Terraform file and all the impacted resources get updated the next time you run
Like with any other type of programming, you’ll want to abstract out specific details to a variables file. This lets you keep environment-specific details out of your actual Terraform file, allowing you to keep the architecture of your different environments consistent. It’s probably not a bad idea to put all values that could be configurable into a variables file (I still have several values hard-coded that probably should be moved). That way if you need to change something in just one environment (smaller volume for a database in the QA environment, smaller instances for your application in the development environment, etc.), it’s an easy change to make, without impacting any other running environments. While you’re at it, if you manage multiple environments yourself, you’ll want to take advantage of Terraform’s workspaces. This lets you separate out the state of different environments even though you’re running Terraform for all of them from the same machine.
These next 2 tips are very AWS-specific (I’m deploying there, so that’s what I’m focusing on). If you’re setting up permission policies for your service, use AWS’s policy builder to build the policy, copy it into Terraform, then abstract out the specifics like ARNs. Writing security policies rules are so hard to get right that trying to do it yourself by hand is effectively volunteering to bang your head into the wall. I get that the idea of using Terraform is that you’re not doing stuff inside AWS, but sometimes it helps to cheat and copy the working answer into Terraform.
If you’re running a lot of Lambdas, you can use Terraform to push code updates. Simply call
terraform taint on the affected Lambdas and the next Terraform run will delete and re-create them. This shouldn’t be an issue outside of development and maybe QA, where you’d be dealing with new code versions in your release tags, which cause Terraform to update the code anyways, but if you’re pushing updates of the same version snapshot, this can make your life easier. The same works for updating API Gateway deployments too, which do not benefit from seeing updated deliverable names.
Terraform is a useful infrastructure management tool, especially if you want to set up your automation pipeline to insulate yourself from changing cloud providers or just want the option of being multi-cloud. It’s the closest we’re probably going to get to true “cloud agnostic” provisioning software (short of public cloud providers standardizing their offerings, which I don’t see happening), easy to use, and (at least coming from CloudFormation) pretty familiar if you were already using a vendor-specific infrastructure management solution. As good as Terraform is, at some point you have to get into the nitty-gritty cloud-specific details of your infrastructure, and you’re going to need a plan for having multiple people use these files to manage a shared environment (or multiple environments depending on how your organization handles such things). But since Terraform is basically a script with input files, it integrates nicely with automation systems, and is absolutely worth using in your organization.