Will platform engineering be the mass-reproducible secret to great software development?

General Commentary, Programming Add comments

Nov 302022

When people talk about the “death of DevOps,” platform engineering is brought up as its successor. That’s probably overstating things. The practices associated with platform engineering certainly look like they have a lot to offer, but getting platform engineering right is difficult. And getting platform engineering right is important, because that’s the only way platform engineering is going to work. Otherwise, what you’re going to end up with is a mashed-up team of random engineers desperately trying to keep infrastructure afloat while developers wreak havoc on everything.

What is platform engineering?

Generally speaking, platform engineering refers to an engineering discipline that involves building what’s needed to make it as easy as possible for developers to actually run their software. If you’ve heard of the paved road concept, then you can think of platform engineering as doing the paving. Yes, platform engineering is the new buzzword, and the paved road concept has been around for a while – don’t let anyone tell you that re-branding can’t work.

You can also think of platform engineering as what an actual DevOps team would look like (as opposed to being a glorified tooling team). Instead of doing a massive shift in organizational culture, platform engineering aims to centralize the DevOps practices and build them into company infrastructure, so all developers can reap the benefits even if they aren’t teaming with operations directly.

So what does platform engineering actually do? The idea is that they build and maintain an internal development platform. For actually writing code, it’d be project templates that auto-include dependencies for things like logging, metrics, and configuration management. For running said code, it could be standardized Dockerfiles, VM images, or even Kubernetes deployments, complete with any 3rd-party agents/sidecars for monitoring, logging configured to push to your log aggregation tool of choice and rotate, and integrated to whatever tool you use for injecting secrets into your running application.

Platform engineering would also include things like supporting non-application updates, security monitoring and alerting for vulnerable and/or outdated dependencies, a clear list of what snapshot of the code is running in what environments, configuration and settings management (so you don’t have to re-deploy everything just to change a configuration value), and rules to prevent things like pushing untested code straight into production.

It’s worth noting that, self-service platform that lets developers own and run their code in production or not, there are some things (e.g. production databases, and even the platform the code is running on itself) that developers shouldn’t have full and direct access to. You’ll still need to limit the ability to directly update databases from a terminal or log into production instances and tweak stuff to a small handful of individuals. That’s not nearly the drawback it sounds like, as a good platform would include tools for basic management tasks, while development and service teams should have admin consoles for simple support and admin operations anyways. The important thing here is to just remember that platform engineering is not the same thing as giving all developers root access.

The good parts of platform engineering

The best take on how platform engineering should be done that I’ve seen is this post by Charity Majors. Majors does a good job of explaining the benefits platform engineering brings to organizations, how it differs from traditional operations, and the types of engineers you need on a platform engineering team. I’d like to emphasize Majors’ point that a platform engineering team should responsive to developer needs, there should be a paved road for any common (i.e. “used more than once”) architecture pattern.

Embracing platform engineering gives developers the ability to control their own destiny when it comes to running software, but with minimal risk since the paved roads have best practices built-in. Good platform engineering focuses on types of problems development teams need to solve, and makes sure there’s a “good” choice for dealing with those – be it event handling, message queues, running a web service, or host a web application. By treating your operations platform as a product the way Majors describes, operations becomes responsive to business and development needs. It is worth noting, that platform engineering is successful when the operations specialists are working closely with developers, i.e. “doing DevOps,” even if they’re separate teams.

Platform engineering as an engineering approach does a better job of acknowledging development and operations are separate concerns than traditional DevOps does. I also appreciate the emphasis it puts on paved roads as a means of simplifying development. Also by “productizing” (not the best word in the English language, but it’s the only term I could think of for “turning a back-and-forth process into a simple self-service operation”) the results of development and operations specialists working together. Done right, platform engineering does a good job of letting developers focus on business problems and code while significantly reducing the risks that come from people who don’t specialize in operations running code anyways.

Why I’m not convinced platform engineering isn’t going to be the future

Platform engineering is a term that is growing in popularity, and has a definition that sounds a lot like “tooling.” Point blank, I think a lot of the places that say they’re going to do “platform engineering” are just going to rebrand their tooling teams, maybe give them a little more work to do, and call it a day. I had to Google around for a while before Charity Majors’s blog post above got recommended to me, and now I at least understand the basics of platform engineering. Given that it takes time and effort to understand platform engineering, and that it involves potentially changing the way businesses operate, I just don’t think companies are going to bother putting in that kind of effort. Real change is hard, cargo culting is easy.

Even if companies try to adopt a platform engineering approach “for real,” like I mentioned earlier, it’s hard. After all, what does an “internal development platform” even look like? Even if you build the resources and tooling needed to deploy code, developers still have to run that code with minimal impact to their day-to-day work. That means making it easy to update the runtime environments plus any libraries you’re providing that support running code. That’s a lot of work to be able to do, hence why platform is meant to be its own dedicated team.

Because platform engineering explicitly breaks operations out into its own group (they’re building the “platform”), it’s too easy to reinforce the “separate development and operations and have the former throw things over the wall to the latter” mentality that DevOps was trying to eliminate. While platform engineering is intended to operate like a DevOps product serving other developers, that’s a cultural change, and organizations claiming to adopt new approaches on development are notoriously bad about skipping the actual culture parts.

I like the idea of platform engineering. Platform engineering gives teams more control over their destiny, and as someone who’s worked places where deployments involve coordinating multiple teams and in places where developers could push their code when it was ready – I much prefer the latter. Platform engineering sounds like a silver bullet for companies looking to have high-velocity, high-performing development teams that aren’t bogged down actually running code. That said, given how companies “adopted” DevOps, I don’t see much hope that platform engineering won’t end up being yet another cargo-culted buzzword.