May 312023

Like just about everybody else who runs software post dot-com bust, I read about Amazon Prime Video re-architecting their monitoring service, and saw all the microservices vs monolith hot takes. For the best monolith vs microservice analysis on the post, I recommend CodeOpinion’s video. What I found more interesting about the post wasn’t the monolith vs. microservice debate (I agree with CodeOpinion – that’s not really the relevant point to the original article), but rather the limits to using Functions as a Service (FaaS)…as a service.

Framing the before and after comparison here is tricky. Like I said already, it’s not microservices vs. monolith. The entire post focused on 1 service that stayed as 1 service. Serverless vs. sendserver doesn’t really work, things like ECS are technically serverless if you’re using something like Fargate. So what really describes what Prime Video was doing before this refactor vs. after? Ultimately, given that this refactor occurred after hitting the limits of building a service out of FaaS parts, I think the comparison we should be making is “running on FaaS” vs. “running in an instance” (for the record, when I say “instance” in this context, I’m referring to either “EC2 instance” or “container”).

It’s easy to believe that you can run a bunch of functions on their own and string them together as a service without deploying to some sort of instance, mostly because AWS says that you can. The reality is that’s true…up to a point. Lambdas have a 1000 concurrent execution limit by default, although that can be removed by AWS if you ask them nicely, but it still doesn’t go to “infinite.” Even if you don’t need to increase the concurrent execution limit and can get by just fine with the default, you’re still paying for each instance of the function while it’s running. For small things that need to be done periodically, or have a low consistent usage, that’s not a big deal. For something like video and audio quality monitoring for a video streaming service (just to use a random example), you’re going to have a lot of functions running all the time.

The thing to remember here is that scaling horizontally and scaling vertically aren’t mutually exclusive. You can do both (the Prime Video post mentions discusses adding additional instances with some orchestration near the tail end of the article). It’s popular (especially when talking about microservices), to focus on scaling horizontally across smaller servers as a cost-saving measure, but it’s possible to run for cheaper by having a smaller number of bigger machines. In fact, that’s what Prime Video wound up doing in order to get the level of scaling they needed.  And of course, you can still add more copies of larger instances as needed. Optimizing instance cost is largely a series of arithmetic problems, but most of the time we don’t have (or want to take) the time to plug the numbers into a spreadsheet and compare them.

So should you use FaaS at all? Sure, but like any other software approach it’s worth understanding the type of problem FaaS is intended to solve and making sure your use case falls into that problem type. In the case of FaaS, it excels at low-to-moderate volume 1-off functions that are needed in response to some external action. Obviously, this can be functions that need to be run on a schedule, but also works well with events from users or external teams (i.e. sending a cache invalidation API call to a CDN when a file is uploaded, or triggering an email to a customer when some external process is committed – anything that can be described as “it’s like a database trigger, but for {X}”).

Should you pile these functions together using a tool like AWS Step Functions and turn that into it’s own self-contained pipeline? You can, if you’re putting together a workflow that your team controls end-to-end, you’re all-in on ephemeral deployments, and each step requires minimal input (i.g. just a JSON object) in order to run. And generally speaking, those work well with low to moderate volumes as well (you certainly could process high-volume workflows, but it gets expensive as Prime Video discovered). The problem with serverless is that they tend to scale very well horizontally, but not as well vertically. For workflows that involve a lot of data (like, say…a bunch of a bunch of images), this denies you the ability to just store them in memory (or at least on-disk), adding cost to your operation at every step.

Personally, I’ve found that processor-intensive operations, especially those that need to be constantly running, are best served by dedicated instances. It can easily be cheaper to just reserve your baseline instances. You can still keep your workflow intact by having that instance run data through a template method, and you still retain the ability to spin up more instances if you need it. Yes, it’s technologically possible to string together a bunch of disparate functions in order to make a service, but you have to ask yourself if that’s the most cost-effective, efficient, or grokable way of doing it. I think you’ll find that once you start having to run that service at scale, you’ll come to the same answer that the Prime Video team did – “No.”

 Posted by at 11:45 AM