Software complexity is like real estate – it’s all about location, location, location

Dec 312023

Eventually, all software becomes complex. Projects that run entirely in the command line are rare, and even simple little web applications seem to get more involved once you actually want to run them somewhere other than localhost. Sure you have your executable package, but there’s also likely to be a bunch of things that don’t exist on your local machine, like load balancing, multiple instances, probably some sort of metrics agent running alongside the code, likely some sort of caching – the list goes on. Running code being used by others always gets more complex than something just used by you, as your code evolves to make sure no matter what people do the application doesn’t crash, error out, or get its data into an invalid state. What’s important is acknowledging that software naturally gains complexity as it moves from “local project” to “running in production,” and make sure we’re paying attention to where we’re adding this complexity, and why it’s there.

When thinking about software complexity, it’s helpful to break it up into 2 types. There’s architectual complexity – the different pieces and components involved in how the code is deployed and run, and there’s logical complexity – how the code is organized, structured, and what it actually does. Both are important and need to be actively managed. What’s important is focusing on why you’re adding complexity, namely the actual business value you’re deriving from making things more complicated.

It’s important to remember complexity isn’t bad per se, as long as it’s being introduced deliberately and for a specific purpose. We add additional servers (and write our back-end request handlers to be stateless) to enable redundancy and let us scale up if traffic spikes. We add complicated logic to our code because we’re dealing with complex business rules that need to be enforced for any software to be usable. The key is adding the correct complexity to the right place to maximize value, and actively avoiding complexity literally everywhere else.

The good news is, most modern software tooling is basically designed to abstract out complexity. Frameworks like Spring Boot let us run a web server, ORMs encapsulate dealing with databases and mapping tables to objects in code, and there are tools to handle converting those objects into JSON in order to make API calls and read responses. It may be tempting to think of all of this as “boilerplate,” but keep in mind, if these libraries and frameworks didn’t exist, it’d all be code you’d have to write and maintain – which would have been the added complexity of running a webserver/managing DB connections/serializing and deserializing your objects on top of running your business logic. These tools are certainly helpful, but only to the extent that they make our lives easier. You still have to be willing to abandon them if they don’t (I removed any logic using a Spring Boot JPA repository object not too long ago because the findBy({ID}) logic was returning the first database record repeatedly instead of each record in the table. That got replaced by a simple JDBC object doing the query and me mapping the fields to my Java object myself, because I don’t have time to dig into Spring Boot and debug how their auto-generated query logic for repositories worked.

Recently, I’ve been starting to view complexity as a recurring technical expense. Sort of like recurring bills when thinking about money, software complexity is something that has to be dealt with just to run or work with the software. Just like paying a subscription fee or recurring monthly bill, the “expense” is worth it so long as I get more value out of the software than I lose in dealing with the complexity. That said, complexity is a “cost” that I’m “paying” regularly, so like monetary costs, I want to keep it under control, and minimize it as best as I’m able.

Architectural Complexity

An easy way to think of architectural complexity is “how many blocks and arrows are there on this system’s architectural diagram?” Is it simple, with code deployed in as few places as possible, or are the logic and data needed to perform the whole system’s functions spread out all over the place? But the important questions to ask yourself when considering how complex your software architecture is, are “Why did we decide to do things this way?” “What constraints were on us when we first thought of this, and are those constraints physical, business, or just something we imposed on ourselves?” For that 3rd category of constraints, ask yourself – “What does it do to our overall system design to assume they don’t exist?” Ideally, we’re asking ourselves this when this software architecture is in the design phase, but if you’re having issues with a service and are looking to improve reliability/performance/ability to onboard new developers to the project, you’re at least asking the question during a point where people might be amenable to refactoring the codebase into something simpler.

If you are in the process of designing a system, the questions to make sure you’re focusing on are: “What are the the problems covered by this aspect of the business?” and “Is there any part of this design that is not there to solve those problems?” For a generic e-commerce example, a billing service would answer those questions as “We need to track the services rendered for which we have not been paid, collect and process said payment, and be able to provide customers with the status of said payments.” and “Anything that does not receive new order information (so we know what new payments are due), hold the invoice information, collect payment information, process payment information (i.e. charge a customer’s credit card), provide invoice/payment status to users and customers, or secure said data against unauthorized access or PII leaking out of the system.”

It’s tempting to note at this point that all software architecture is simple until you start building it for resilience and reliability, which is a very good point. The architecture needed to successfully run on your local machine is very different from the architecture needed to run in production with many more users hitting the system all at once, where anything could go wrong but you still need to be up and not lose data. But if we focus specifically on “what is the harm being done to our business if this goes down,” and “how much usage can we expect this system to have,” we can design an architecture that fulfills our service-level objectives in as simple a manner as possible.

For example, if the concern about services going down is that other services lose access to data they need, you can either store a local copy of external data as a backup, or use a cached copy. If the problem is an operation can’t complete because an upstream service is down, then perhaps its time to start pitching the idea of moving to a messaging- or event-based architecture. That adds some complexity (introducing caching in the first example, and the messaging/event system in the second, along with associated observability), but it’s just focused on the specific problems you’re system is really worried about. In my experience, the simplest, most reliable, and generally best ideas come from a focus on the problem and its explicit constraints, paired with ruthlessly ignoring everything else (e.g. the nine dots puzzle).

Designing services in a way that makes them truly reliable adds complexity, so it’s important that you’re focusing on the problems caused by your service going down in order to both solve for reliability and make sure you’re not making your service harder to understand, deploy, and manage than it needs to be. The goal should be the simplest possible architecture diagram you can draw that:

Solves the problems associated with the service’s area of responsibility
Enforces the logical boundaries around that area of responsibility (e.g.the only way for a task in this part of the business to be performed is to go through this service)
Is available when users and consumers need to interact with it directly
Can still run if other services are down
Doesn’t lose data that the business needs or fail to processes anything

Don’t try to solve problems that don’t need to be solved. Keep the focus on the business value your service should be providing, and mitigating the harms that come from your service being unavailable.

Logical complexity

I’m using the term “logical complexity” to refer to any complexity in the code itself. This covers both what the code is doing, and how the code is organized. Structuring your project like the enterprise edition of FizzBuzz makes it more complex (without adding a single bit of value). The code is, hands-down, the easiest place to add complexity to your application, generally because people are regularly adding to it but we hardly ever review what was worked on holistically to see if it can be simplified after making your changes. That said, it’s important to push back on any complexity that isn’t adding business value (which I’m defining as “things that make the business money”).

The place where adding complexity is going to actually have the biggest payoff is in business logic (worth noting – the payoff is in the actual business logic, not input validation). This is the place where you’re actually differentiating your app or service from competitors, solving hard problems, and just in general doing the stuff that makes your code valuable and useful to someone. I know the “software engineers should understand the business” line is cliche, but it’s true, and it’s for stuff like this. Software gains complexity over time, but you want to make sure that the increased cognitive load you’re going to be dealing with is worth it. That means making sure you’re only adding complexity to things that also add value, and pushing back against adding it anywhere else.

Pushing back on adding complexity anywhere else means any code that isn’t related to your business’s core competencies should be as easy to read and understand as possible. That means full-length variable names everywhere (no more single-letter variable names), taking multiple lines of code to do things (even if they’re simple), writing lots of comments, and no unnecessary abstractions. Put simply, if developers have to look at a block of code and stop to think in order to figure out what it does, or chase logic around umpteen files and references, your code is too complex and you’re wasting valuable brain power that should be getting used elsewhere.

In short, software complexity is something that should be saved for things that are unique to your business and product, and should generally be avoided in every other aspect. Try to save the thinking and cleverness for things that only your company can do. Netflix used to discuss their rule for how they decided to open source a piece of software – they’d open source anything that didn’t offer more of an advantage to a competitor than to other people. In other words, the “undifferentiated heavy lifting.” In our code, the “undifferentiated heavy lifting” should be kept simple, so the “recurring cost” of complexity is tied up in the parts of the application where it really matters.

Software complexity isn’t bad. We use software to solve problems that would be prohibitively difficult to do manually (either at all or at scale) all the time. Building a system that can stand up to failure adds complexity, but it’s worth it to know that the code that our businesses rely on keep running. Unnecessary software complexity is what’s a problem. Code and architecture aren’t required to have as many pieces and components as humanly possible. Start with the simplest way of organizing a solution, think of what could go wrong, and exactly why that’s bad for the business, and change your architecture to address those specific problems. The same thing goes for the code itself – start with the simplest project structure, and then start putting in the simplest possible logic, making small refactors regularly to clean things up and make following the thought process behind the code as easy as possible. Don’t make people stop and think in order to be able to understand anything that isn’t a key facet of your a) business domain or b) availability. Businesses deal with hard enough problems, don’t let the software be an additional one.