Jan 312022

So I was working through an issue where I’m having to put together a multi-table query that gets run as part of a scheduled job, and as I’m taking a break after finding the XML file and getting this query (and resulting data mapping) added to said XML file, I come across this tweet, and it described exactly what I was feeling.  Why was I shoving queries into an XML file shunted away in my  resources  directory instead of just building it in the code? Surely there are better ways to construct SQL queries, right? Well, maybe, but in the end, what I was doing was probably better than I thought it was at the time.

Generally, we use object-relational mapping to translate a record in a database into a plain old {insert programming language name here} object. For simplicity, I’m going to assume you’re using Java, mostly because I’m already used to using the acronym POJO and now I don’t have to change anything to write this blog post. If you’re programming experience consists almost entirely of databases with a document-based model (like Mongo), ORMs probably seem like a huge amount of overkill. That point of view makes sense – your data model is largely (although not exactly) a JSON document. That probably led a lot of people designing their database so that collection field names matched their POJO field names almost exactly (possibly excluding the _id column – if you’re the Mongo sort – in your document database of choice).

Compare that with relational databases, particularly in cases where the column names don’t match object names, and all of a sudden having a library that handles converting data from your database into POJOs for you to use makes a lot more sense, and becomes a lot more useful, especially considering that SQL data types aren’t Java data types. And to be honest, if we did a better job of advertising ORMs as solving that problem, and only that problem, we’d probably all be a lot happier.

But if you look at the quoted tweet – the issue wasn’t ORMs, it was writing database queries in the context of the ORMs. And that experience…varies. Let’s look at Hibernate, an ORM for Java so popular that for years it went hand-in-hand with Spring when talking about enterprise applications. Turning to their tutorials for the most recent production version, and their queries are pretty explicit and verbose about going through every step of the process, but ultimately let you construct a SQL query in a DSL that feels comfortably enough like Java. Sure, for some (probably smaller, more basic) queries it’d likely seem much more sensible to just write the raw SQL, but easy little queries aren’t why we put the time into setting up ORMs and working with them.

For document-based databases, there’s Morphia for Mongo, The process for building queries is similar to Hibernate (but cleaner and more concise). Spring Data is even more simplified for basic queries, and can limit more complicated queries to a  @Query annotation on the method. But with both of these, as with Hibernate, the second you try to do something more complicated than some basic CRUD gets ugly.

Then there’s options like MyBatis, where you write out your queries and data-to-POJO mappings in XML files, and then reference those in your code through namespace matching – also known has “having to manually keep something consistent in multiple places.” This certainly feels worse than the Hibernate option did. Sure I’m writing raw SQL, but I’m writing it in an XML file, where I’m also manually mapping every field in my query results to every filed in the related POJOs, which feels like I’m doing the ORM work. That said, like Will Madison points out in his original Tweet thread, it’s a lot better for more involved queries.

What do I mean by more involved queries? I mean any query that involves joining multiple tables that doesn’t necessarily include all the fields from those tables. There’s also aggregation queries, or queries with subqueries. In short, any time you want to do more logic in your database operations than simple filtering. ORM query builders are an abstraction layer over the database’s actual query language. That make more sense if you were using the same ORM library for multiple types of databases, but how often is that going to happen? Generally speaking, if you have 2 different database, you’re also using 2 different repos, and 2 different ORM instances, if not 2 different ORMs altogether. In fact, this is such an edge case that I have never seen an ORM library that supports both relational and document-based databases.

It’s easy to appreciate the simplicity of ORM query generation – everything is right there in the code, as code. Everything about it just seems easier. Some tools can even remove all the work for you. Personally, I like reading and writing code that takes this approach because of that apparent simplicity and I like keeping everything in code, but it only works for the simplest CRUD queries. The reality is that ORMs like MyBatis work better as you ask your database to do more. As mildly irritating as it is to manually write queries and write the mappings, it actually keeps the more difficult work at the same level of difficulty of the easier work, which is the most you can ask for out of any ORM. The secret is to manage your expectations – namely, don’t expect your ORM to do everything for you. In fact, it’s usually better that it doesn’t.

 Posted by at 11:45 AM