Cloudy Future in Store for Databases

March 22nd, 2010 @

There was lots of talk about cloud computing at the 2010 edition of The Server Side Java Symposium in Las Vegas. The funny thing about software engineering is that nothing new seems to have been invented since about the time of the original Star Trek series. Everything just gets recycled and renamed (in fact, just like the Star Trek franchise). Take cloud computing, for example. There are some great things happening in this space and new ideas about software engineering are emerging. But in essence, this is what we used to call grid computing in the 2000’s and what Corba wanted to achieve in the 1990’s.

This time around, though, distributed computing cloud computing (admit it, the name sounds great), is being packaged with philosophies about the right way distribute and scale processing. Take for example Google and Amazon. They offer cloud computing as a PaaS (platform as a service). You can forgo your hosting center and have your application hosted on their platform. But before you get on the cloud, you have to drink the Kool-Aid.

What’s in the Kool-Aid?
Google, with its application engine, demands that you forget everything you know about relational databases and go instead for the proprietary BigTable, which sees data hierarchically. Data needs to be denormalized and replaced with duplication, replication and partitioning. Relationships are not enforced by the persistence layer but rather at the application layer (this means you!). So you can have foreign keys that point nowhere and your app needs to deal with it. Transactions are restricted to closely related data. Otherwise, multiple transactions are needed and compensating transactions in case of rollbacks. Essentially, you need to let go in the whole consistency notion and opt instead for eventual consistency.

The sacrifices made here pay off in terms of scaling out applications. Google claims that they can guarantee a consistent response time regardless of the size of the data set in a query. In other words, if running a query returns 100 items in a result set, that response time will be consistent even as the table grows. Just try to get that kind of response time with your relational databases, with all your normalized data and consequential joins. With relational databases, you tend to have faster writes but slower reads because typically, you need to re-join tables on the read. Google’s approach, with denormalization and data replication, will have the opposite result – slower writes but faster reads – a trade-off that makes sense most of the time.

The cloud commandments
The cloud intelligentsia believes in the following tenets:
Scale first – not last: This means that you bake scalability into your app from the outset instead of the traditional approach where performance is considered at the end of the development cycle. (Kinda flies in the face of the YAGNI principle.) Persistence figures prominently in the overall equation.
NOSQL (Not Only SQL): Not everything needs to be stored in a relational database. This allows some data to be moved out of the database allowing some processing offloading.
Consistency, Availability, Partition-tolerance – pick two: Consistency is the principle that concerns itself with coherence between related data. Availability relates to fault-tolerance. Partition-tolerance is the quality of never needing to re-shard/re-partition or move around data. You can pick any two of these or a little of each. But realize that they are opposing vectors that negate each other in a zero-sum game. You pick two and sacrifice the third.
If these concepts are baked-in your application, you are ready for PaaS.

Outsourcing the DBAs
Amazon has an interesting twist to its offering. Rather than having your app entirely cloud based, it can be hosted in your own environment but access the Amazon cloud-based services. For example, it offers the Simple Storage Service which is a cloud-based storage system wrapped inside a RESTful API. You can outsource your entire database needs and not worry about hosting this storage service, including the infrastructure costs, and pay only for what you use. This is great because it makes it someone else’s headache. IT departments today need a small army of people ensuring that database servers are up and running. In typical production environments, there are frequent database connectivity issues that cause outages. There are locks that cause applications to freeze and there are inexplicable database server crashes that corrupt user data. How does cloud computing make those troubles go away? Just drink the Kool-Aid, man, and you’ll be fine. Tough sell.

If you can strip away the hype from cloud computing, and you take everything with a grain of salt, you’re left with interesting ideas. Still, I don’t think this technology is for everybody as it requires big sacrifices. Getting rid of the relational databases is a big price to pay unless your app is destined to become the next Facebook. Like all other hyped technologies, some architects will see to it that their app is on cloud steroids. When the dust settles, we will all have learned something about how not to engineer an app.