What it means to actually run an API

General Commentary, Programming Add comments

Oct 312021

It’s really popular to say we’re writing web services, built on top of RESTful APIs, but the reality is what most of us are really doing is writing web applications that make REST calls back to that app’s server for data, not actually calling a generalized web service with multiple sets of users. That’s fine – I use web applications almost all day just like everyone else. Occasionally we need to actually get some data from another team’s data store, and it’s cool, they have an endpoint you can hit and get it. See? Web services with RESTful APIs. The problem is there’s a difference between, “There’s an endpoint that can give us that data” and running a service who’s purpose is to be used by anyone other than the team that wrote it.

The key, all-encompassing word here is stability. Outside of a major version update (which should be announced well in advance), the only indication users should have that the service changed is whatever mechanism you have for sending release notes out. Nothing is as frustrating as dog-fooding your company’s product, 1 that is marketed to your company’s end users as an API, seeing stuff stop working, and trying to run down the developer who that part of the code, only to have them tell you “Yeah, we made a change that completely broke that, and there’s no way it’s ever going to work again without reverting that new feature we just deployed because it was a huge internal team goal.” Sure, this broke a publicly documented feature that was marketed to paying customers, but they don’t care – they’re announcing a shiny new feature. That’s what the difference between running a real API and just having an endpoint other people can use looks like.

If you want the back-end code you’re writing to be considered an API, then the first thing you need to understand is that you and your team are not the users. As a best case scenario, you’re writing a business-to-business API which means the end users who will notice changes in your API happen to be the same developers writing code that calls it. The more realistic point of view is that there’s at least 1 middleman between the developers actively consuming your API and the people who are going to immediately feel the pain if you push a change that directly changes their experience in any way. Playing telephone wasn’t the most fun in elementary school, but if you want to write APIs that are taken seriously, you better learn to accept that’s how problems in your code are going to be surfaced.

Here’s a real-life scenario to illustrate what I mean about focusing on stability above all else. I have a service with REST endpoints that I’d like to be able to use as an API for other teams to interact with. Early on, I wrote a simple save endpoint that just overwrote what was in the database with the request payload. Well, as I built this out, I would update fields in this same record as part of another process. This created a very easy-to-reproduce race condition. At the time, because I wanted the endpoints to be simple, I just added some conditional logic around the save, basically saying that I would only modify specific fields if the record already existed, and save the whole payload for new records. Eventually, I realized that was dumb and I should have just written an update endpoint, but second-guessing in software is a waste of time – the code worked, and something needed to be shipped. So now there’s code running in production with 1 endpoint that both creates new records and modifies existing ones. At some point I may add a second endpoint that just updates existing records, but now that this API is running, even though it’s only being used by us right now, stability demands that the existing dual-purpose endpoint do both creates and updates until I create a /v2 set of endpoints (by the way, you are versioning your APIs, right?). That’s because for my API to be stable, any change I make to this version should require no code changes whatsoever on the part of anyone calling the API. Doing anything else breaks multiple code bases, without warning, in production environments. That’s inexcusable for software that’s supposed to be usable.

Another thing I’ve seen that makes well-run APIs stand out is the way they’re built to minimize the impact problems on their side have on users. This can include things like writing data to queues or databases before processing, so if there’s downtime, user data and changes aren’t lost, just delayed. It’s a concerted effort to make using the API as painless as possible, including things like SDKs in as many supported languages as possible, as simplified process as possible for things like authentication/token renewal, and performing updates (I’ve seen APIs that were big into immutable objects being returned from API calls, updating data got…verbose…very quickly). It’s also taking every opportunity to behave like most people assume HTTP calls will behave.

What do I mean by behaving the way people assume HTTP calls behave? First and foremost, use the established HTTP status codes as they were intended. If a user submitted something and nothing directly changed as a result (i.e. submitting a form that isn’t directly modified), return an HTTP 202. If something went wrong, then use the existing error status codes to tell the user. It’s frustrating to try to use an HTTP API, get a success response back, but still have to double-check the response body to see if the call really failed because the standard status code that’s supposed to answer that question isn’t trustworthy. HTTP clients are designed around HTTP standards, with the idea of making it easy for users to handle errors. Make their lives easier and play into how everyone expects these things to work.

Speaking of making APIs as easy as possible to use, nothing beats good documentation, not just around what the endpoints, parameters, and responses are, but also around what the calls are capable of in the first place. This goes beyond OpenAPI listings of the endpoints, parameters, and response objects. A list of methods, inputs, and outputs doesn’t cover things like limitations of the API, or offer a bigger picture about where calls fit into the larger system the API provides an interface for. A README is nice, but really the best APIs have full-fledged developer guides, that include not only individual endpoint documentation, but examples and context around what your service actually does, and nitty-gritty details about how exactly the return values work (things like the list of comments from a post are returned in order of the commenting user’s internal ID, limited 100 comments, with no pagination – not that I’m bitter about learning important details from an API by having production issues instead of being able to read it in the documentation or anything). Yes, I know the whole idea behind encapsulating things into multiple services is that you can treat them as black boxes, but a high-level summary of what’s going on winds up being extremely helpful.

It’s really easy to say that developers using your API shouldn’t have to worry about how your API works, just what the API contract is and where to fit this into their business logic. That’s a lot easier to say than to do. What makes working on APIs so difficult is that there’s generally 2 or more degrees of separation between you and the people impacted by any problems you introduce (the developers using you’re API would sit between you and their customers dealing with your problems). Because you can’t possibly know what everyone is doing with your API, all you can really do is focus on making sure any changes your make don’t cause direct any changes for your users, because that causes unintended changes for their users (which is a weird thing to write, but it’s basically what you’re doing). That isn’t to say you can’t make fundamental changes to your API, you just have to make sure you’re versioning it and making sure you increment the version with any breaking change. The requires planning on your team’s part, because you still need to fix bugs and security flaws in the old version during the sunset period, which should be long enough to give users more than enough time to migrate to the newer version. You also don’t want to increase your version too often (Facebook always seemed particularly bad about this), as that starts to make using your API more trouble than it’s worth.

Saying you have an API means more than just a server with REST endpoints behind some sort of authentication system. Your entire approach to development changes. You wind up having to adopt a “do no harm” philosophy to protect the experience of your user’s users. Your changes, outside a formal version upgrade, have to be either additive, or have no impact on any code running your API. Doing it right feels like you’re developing a product with 1 hand tied behind your back, but that’s what providing a good experience to your users is like when you’re shipping an API instead of a product. If you’re unable to make that kind of commitment, then you’re not ready to say that you’re offering an API.