If you ask a bunch of CEOs what their company’s greatest asset is, most will tell you it’s their people. That certainly sounds nice. It may even be true for companies that don’t deal with data. However, if a CEO who works for a company based around a website says people are her greatest asset, she is definitely lying. Great people got the company where it is today, but now that it’s a success, the most important asset is the data that has accumulated. If Edwards, your super-star coder, gets hit by a bus, it will take you six to eight weeks to train Henderson or Stevens or Erikson to replace him. However, if you have a data meltdown—one that creeps in slowly, undetected at first, until all your precious data is turned to garbage—be prepared to start over from scratch. Not even your backups can help, because they’re all corrupt, too.
The most important asset of a web-based company is its data. The most obvious type of data to protect is operational data. If you sell goods online, your site is useless if the product descriptions don’t match the products. If you run a social networking site, who will come back if the network links get lost, crossed, or lead to user pages that no longer exist? What good is an online personal information management tool if your to-do list items disappear before you get to check them off yourself?
Historical data corruption is another common and insidious problem. Imagine if you could no longer report on how many units of a particular widget you sold month over month last year simply because you no longer sell that widget today. Was the item’s database record deleted when the item was taken off the shelf, and now the historical data referencing it points to an empty record? Or what if data you think is important actually isn’t? In your hosted blogs site, are you reporting statistics of total comments added site-wide, but half of the comments are for entries that have long since been deleted by the author? Operational data changes with the times, but historical data that references yesterday’s operational data needs to be accessible and accurate today and tomorrow, too.
Most web framework books teach you how to add data to your database, but they don’t teach you how to protect it. This book picks up where those books left off. This chapter is intended to help you frame the way you think about databases. Databases are a major part of your entire architecture, not just a place to store application data. The next four chapters show you how to design a solid data model incrementally and how to tightly integrate it with Rails.
We tend to think of a web framework as the solution to all problems. Rails especially tends to abstract other pieces of web architecture away so that Rails itself seems to be the only piece of the puzzle. This is especially true of how Rails abstracts away the database. Rails now ships with SQLite as the default database, so you barely have to think about setting up a database at all. Next, the task of writing DDL has been buried behind migrations. DML, the bread and butter of SQL queries, are abstracted away behind ActiveRecord. Finally, the task of maintaining data integrity is left to ActiveRecord validations.
The problem with abstracting to this degree is that it requires that you make a few assumptions that are unlikely to be true.
There are many frameworks out there besides Rails. There’s PHP/Cake, Drupal, Django, Struts, Perl/Mason, etc. The list goes on and on. If you’re lucky, you’re rewriting your legacy PHP or Java application in Rails right now. If so, one problem you now face while you’re busy implementing the latest JavaScript interface magic is remembering all of those special cases and boundary conditions that led to bugs in your legacy PHP system. It took the previous engineers years to stamp out each pesky software bug, and you have to replicate all of this intricate logic again while also rewriting the interface from scratch so that the new site is 10 times snappier than the old one. Maybe you are painstakingly meticulous and everything turns out all right. But what happens in the next iteration when you switch to the yet newer, more whiz-bang framework? Hopefully your next framework is the next version of Rails, but you get the idea.
Software is constantly in flux, but the data you collect over the years is not. Wouldn’t it be nice if you could ensure the integrity of your data without concern for the current software stack sitting on top?
The plain and simple truth is that software has bugs. Your application code will change much more frequently than your database schema. When you add new columns to a database table, it’s very easy to forget to add all the appropriate ActiveRecord validations. It’s also easy to comment out well-intentioned validations but then forget to uncomment them. Finally, there are lots of scenarios for which no ActiveRecord validations exist in the first place (referential integrity constraints being the prime example), so relying solely on ActiveRecord validations to maintain your data’s integrity is simply a recipe for disaster. On the other hand, built-in mechanisms of an RDBMS can make protecting your data easy and worry-free. Accept that your application will have bugs, and leave it up to the data layer to be the final gatekeeper of what is allowed to enter the database.
The next assumption is that the application you are writing is the only application that will ever access the data you are storing. Forget about wholesale framework switches here. As your application grows, you will add myriad scripts that run scheduled maintenance tasks to clean up or to summarize data. You will write quick-and-dirty tools that live outside of your website’s main code base. You will even (probably more frequently than you expect) access the database directly through a database client and manipulate your data with raw SQL queries.
In all of these scenarios, you are likely to be bypassing your ActiveRecord validations. Therefore, it’s necessary to rethink the main function of these validations. Since the scope of the validations is only the application in which they reside, they cannot possibly be relied upon to protect your data from other rogue programs, or even from a well-intentioned developer sitting in front of a SQL prompt. The validations do help generate an interface that gives the user helpful feedback before rejecting bad input. And that’s the key: validations do not safeguard data. They can be bypassed, turned off, or easily deleted. Only at the data layer itself can this be accomplished.
Get Enterprise Rails now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.