15 September 2012

Choosing a Database

Choosing databases used to be pretty straightforward. The answer was always a SQL-based relational database. Nowadays the landscape is different.

I decided to use a non-relational database (referred to as NoSQL) for my next project. Why? Because I don't need it. The plan is to use denormalized read models to service UI views and use event sourcing for my domain objects. Neither of these things require relations at the database level. Further, with the use of a document database, I could even nest some relationship data that might be required for the view.

So which document database to choose? Firstly, my project will use HTML/JavaScript as my main UI, so a REST API into the database will save a lot of code. A document DB with a REST API pretty much narrows it down to CouchDB or RavenDB. What follows is my decision process for choosing between them for a .NET web-based application.

Note: Please correct me if you see any incorrect information!

-CouchDB is free and uses the Apache 2.0 license, which is not all that restrictive and can be used in commercial software.
-Raven is dual-licensed with the AGPL (which is the nastiest viral license out there) or a commercial license (which you pay for).
Being as my next project has no guaranteed revenue but I wanted to retain the rights to sell it commercially, paying for a database wasn't ideal when a free alternative exists, and I almost settled on Couch right here.

Note that Raven's AGPL license has an added exception that allows you to use a different OSI-approved open source license for your project that uses Raven and not have to convert to AGPL. But it doesn't free the users of your project from the AGPL's constraints.

I need a .NET client, primarily so the server can update read models.
-Raven has a built-in .NET client with support for LINQ queries, performing updates, and defining indexes in a .NET language. (And a lot more things that I put in the Developer Capabilities section.)
-CouchDB has only the REST API. There are many user-contributed libraries to wrapper the REST API, but none have the depth and capability of Raven's client. Non-trivial database instructions (like updates) must be written in Javascript, which is a bit awkward from a .NET language.
-Both databases have Web-based administration tools.

Developer Capabilities
-RavenDB as lots of advanced options: loading related documents (almost like a relational database), full-text queries, partial updates, set-based deletes and updates, transactions, spatial queries, ... the list goes on.
-CouchDB has more capability as an application server with design documents, validation-, and show-functions. But I don't need this, and I don't like this mix of concerns. Also, CouchDB is missing an important capability for my project; partial updates.

Deployment Stories
-RavenDB can be run as easily as referencing the DLL in my project and newing up the store (embedded mode), or can run as a Windows service, or an IIS application. RavenDB server is tied to the Windows platform (although the client can run on most anything through Mono). This would be a downside for some, but this is the platform I'm targeting.
-CouchDB is supported across many platforms, but it has basically one deployment story: run as a service (or daemon), with or without console interaction.

Distributed Stories
-RavenDB has replication and sharding support. Sharding is decided on the client, though.
-CouchDB has replication, load balancing, and clustering. The latter two are more for data-center scenarios, whereas Raven's sharding is capable of dealing with data locality (by geographic region, for instance) in addition to just hashing by key for data-center scenarios. Although, load-balancing and clustering are decided on by the server.

Based on my deciding factors, RavenDB is clearly the winner. Targeting the .NET platform, RavenDB brings much more capability to bear than CouchDB. It is also much easier to integrate with .NET code. Sharding support brings better scalability than replication for geographically distributed applications. The only downside is there is a cost for commercial use. However, that cost is well justified considering what it brings to the table, and the amount of integration time it saves with .NET. It's also still available for free to open source projects.


NaturalCause said...

Out of curiosity why did you look at RavenDB and CouchDB rather than RavenDB and MongoDB?

MongoDB has a .NET driver with LINQ support.


Note: I'm not trying to promote MongoDB as being better or anything, I prefer RavenDB. Just curious.

Kasey said...

Mainly due to the REST API. My primary client application will be an HTML5/Javascript app, and having a REST API directly into the database saves me writing a read layer. Also, perhaps I didn't look hard enough, but the only listed pricing I found for Mongo commercial licensing was cloud hosting.

Kasey said...

Followup: I found some pricing information from 10gen here. Only the subscription licenses have prices listed (the commercial tab says to contact them for pricing), and the prices listed are not all that compelling.