Delete or get off the pot

General Commentary, Programming Add comments

Jun 192013

I have to be honest, I’ve never understood part of “soft” deletes that makes it a good idea. You know the basic gist, labeling an action “delete”, and removing it from display, but behind the scenes all you’re really doing is flipping a bit somewhere to tell your application to never show this thing again, as opposed to actually removing it. I suppose it makes sense for when you absolutely, positively, need to retain data (auditing purposes, court orders, etc.), but as far as a general practices go, “delete” should mean “delete”, not “please just don’t show it to me again”.

The general idea behind soft deletes is that data is never really removed, it’s still there on your system in case you need it again. You could use this to go back and “recover” data a user didn’t mean to delete (although if you put friction into the deletion process, that will help prevent users deleting data they didn’t want to in the first place. Like I mentioned before, this also lets you actually keep data you’re required by law to keep, even if the user has no more need for it. However, those are really the only reasons I’ve ever been able to think of why you would want to soft delete instead of deleting something. But even then, surely a system that intercepted the request and copied the data elsewhere would work just as well, without having to deal with the fact that nothing is ever actually being deleted.

1 of my issues with “soft” deleting (as opposed to real deleting) is that you’re essentially lying to your users. After all, none of these apps have buttons labeled “Flag this in the database so we don’t show it to you again”, they all say “Delete”. Users tell your application to “delete”, thinking that the thing they wanted gone is actually being deleted and no longer exists. If your application says “Hide”, that’s one thing, but “delete” means “****ing gone”, as in “what the data I was referring to when I said ‘Delete’ had better be'” is something else. Lying to your users is bad, either delete their data, or let them know that your application is a roach motel – data goes in, but it’s never let back out.

Another problem with soft deletes is that you’re ultimately just wasting space. Stuff is never getting cleared out, so your data is growing with nothing keeping in check. That means you’re going to ultimately be paying for space that is never going to do any good for the user, because they thought they had gotten rid of it already. The data is stale (since it’s never being shown to the user again). And while everyone likes to talk about how well hardware costs scale per unit of data, letting your data grow amok adds up fast.

Soft deletes don’t just have disk space and financial costs. All that data still in your system is crap that’s sitting on your disks, probably in your databases, and has to be indexed or skipped past when querying said data for anything of value. You have to structure your queries to work around the crap in the database, either remembering to filter it out in every query, or having your queries go through a library that knows to filter out the crap records your hoarding policies won’t let go of. The first option is risky because there’s tons of opportunities to forget to filter out deleted records or do a bad job of it. The second option makes sure all queries filter out deleted records, but creates a new problem in that the query you’re writing is now never the query actually being run against your database. That last option seems like a great idea at first, but just wait until you’re trying to test and debug queries. I can promise you from experience, failing to note just what happened to your query in the log can make trying to figure out why you’re not getting the results you think you should be can take hours longer than it should.

If you really want to never actually delete your user’s data, there’s a few things you need to make sure you do. First and foremost, you need to decide if it’s worth the wasted space to hold all this useless and unwanted data. This data is going to go stale once the user “deletes” it simply because from that point on it’s abandoned and irrelevant. Next, you need to figure out how you’re going to keep this stale, abandoned, and irrelevant data away from the users, but at the same time making what’s actually going on visible in the logs, all in a manner that doesn’t let developers slip up easily and return “deleted” records. Lastly, you should be making it clear to users that they aren’t “deleting” anything. They’re “hiding”, “archiving”, “removing from view”, but their never “deleting”. And once you go through all of that, there’s still 1 more thing you need to do, and that’s be able to explain to yourself why exactly all that hassle is worth it.