Oct 312018
 

I got a comment on my post about performance tuning a REST API call asking about code tuning with examples. I don’t have code examples handy, but I can certainly run through some general performance tuning tips I’ve found over the years. No matter where you find advice about improving your code’s performance, keep in mind that every situation is different, and those differences could impact the relevance and usefulness of any advice offered.

Architecture

First off the bat, make sure your code can scale horizontally. This is especially true if you’re running in a public cloud. If your load is spiky, predictably or not, then sometimes the best, and easiest, option is to run some more servers for a little while. Hardware is cheap, and spinning up some temporary servers every now and again is a lot faster than a full-on refactoring. Trying to performance-tune code should be reserved for something that consistently has a heavy workload or needs to run as close to instantly as possible but is instead consistently on the order of minutes behind.

Code structure

Another general-purpose tip is to parallelize your code as much as reasonable. An example of this would be for a multi-tenant service grouping data by customer and then processing each customer in parallel (just be careful if you’re using Java’s parallelStream).

Another thing I recommend as a general good practice (although it likely won’t make a huge difference in run times) is to stop trying to process data as soon as possible. There’s no sense wasting milliseconds operating on data if you already have enought information to tell it’s going to amount to naught. Again, don’t expect major performance improvements, but every little bit helps.

Database-related advice

In my personal experience, the best place to focus on when trying to tune code is on the database-related code. Done right, tweaking how you go about processing data either you just read, or are about to write to the database can give your code a big speed up.

First and foremost, if you are running queries that could result in the same record(s) being returned on subsequent calls – cache your query results. An in-memory cache is always faster than going out to the database.

The second database-related improvement advice I have is to batch your writes to the database. If you’re using some form of ORM, you’re probably used to writing an object to the database using some form of   myDatabase.save(myObject); That works well enough for small volume of data,  but if you’re trying to process and update thousands of records, it may behoove you to try to run 1 update query to modify all the records at once. Granted, this only helps if you’re trying to update (or clear) a lot of pre-existing records,  but it’s something to try to aim for if your problem set is conducive. The same goes for deleting records by the way.

Another tip I have is to page your reads. Page size is dependent on your data – but a lot of APIs and database libraries have reasonable defaults that you can run with. The big thing here is not try to read all possible records that match a query at once. This can cause memory issues if you’re trying to load a large enough data set, not to mention trying to load multiple pages of data at once creates a pause in your application while everything loads (page sizes usually default to something that returns quickly).

This brings me to another point – the best thing you can do is try to avoid operating on “all the _______ records” whenever possible. Anything you can do to filter or reduce the number of records you have to work on is worthwhile. We dramatically improved an evaluation engine at an old job by no longer trying to query the database for any records that would need to be evaluated and instead adding them to a “to be evaluated table” when any action that would warrant evaluation was performed. Avoiding the database query in favor of explicitly listing the records that were due for evaluation literally saved us minutes. Things that you can do to limit your queryable dataset are big sources of time savings.

That dovetails into another piece of advice – sometimes making sure you’re using well-indexed queries isn’t enough – those queries can still take a while to execute. That often happens on collections with a very large number of records. 1 easy thing you can do about that is to automatically expire or delete data after a certain period of time (it’s also a best practice anyways). This clears out old data that’s clogging up your database and slowing down queries by forcing your database to slog through useless data.

Improving the performance of software generally involves throwing more cores at the problem (either from your existing machine or by adding more machines), or by reducing how many records you’re dealing with. The catch with the latter is that doing so typically doesn’t involve better database queries, but rather thinking of any trick you can to reduce your dataset size before querying. This is where the real cleverness and creativity comes in, and sadly there’s no universal trick to it.

 Posted by at 11:45 AM