Wanted: A kickass .NET API for CouchDB

Wanted: A kickass .NET API for CouchDB

CouchDB logo

Over the past couple of weeks I’ve been playing with Apache CouchDB. If you haven’t seen it yet, you should check it out — it’s a document database that stores data as a flat collection of JSON objects (aka documents) with no schema, no normalization, no indexes, and no JOINs.

Communication is via RESTful HTTP commands — for example:

GET http://127.0.0.1:5984/books/42

…for retrieving a document ID ’42′ from a database named ‘books’. This would return the following JSON document:

{  "_id":"some_doc_id",  "_rev":"946B7D1C",  "Title":"Harry Potter",  "Author":{    "Initials":"JK",    "Surname":"Rowling"  },  "Pages":"436",  "Tags":["wizards", "children", "magic"]}

It sounds pretty crazy if you’re used to big relational databases, but it’s an important shift that is becoming more and more popular with the likes of Google BigTable and Amazon SimpleDB. The fact is, if you’re not writing a transactional application where normalization is a priority, do you really need all the complexity of a relational database? Many common web applications like blogs and CMSs and Twitter could work quite happily with databases like CouchDB. Check out these two great articles that explain it further:

  • Assaf Arkin: CouchDB: Thinking beyond the RDBMS
  • Infoworld: Slacker databases break all the old rules

Anyway, CouchDB is still quite young, and hasn’t seen a lot of attention outside the Ruby and Python communities. This sucks. I would like to see a well-designed open-source library for .NET applications using CouchDB for persistence. Here are some features such a library might provide over and above raw HTTP WebRequests:

Requirement #1: Generics and automatic serialization/deserialization of .NET objects

At time of writing, I can only find one CouchDB available for .NET. It’s called SharpCouch, and only works at the string level:

string json = db.GetDocument("127.0.0.1", "books", "42");

Instead, it shouldn’t be too hard to use something like JSON.NET and generics to automatically map between JSON strings and live .NET objects:

Book book = db.GetDocument<Book>("127.0.0.1", "books", "42");

Requirement #2: mapping RESTful HTTP error codes to .NET exceptions

Being a REST app, CouchDB uses HTTP status codes to indicate different states of success or failure, e.g. 404 when a document doesn’t exist. Such errors should be mapped from raw WebExceptions into nicer types like DocumentNotFoundException and DocumentConflictException.

Requirement #3: Implicit Offline Optmistic Lock via document revisions

Offline Optimistic Lock is a pattern used to prevent concurrency problems when two people edit the same piece of data at the same time. Basically, instead of locking an object while one person uses it, you allow multiple users access at the same time, but each user checks to see if it has changed before they modify it.

Optimistic Offline Lock is usually implemented via versioning. As well as an ID, each object has a version number that gets automatically incremented each time the object is modified. Before a client saves changes to the object, it checks to see if the version number has changed since it retrieved it. If it has, then you know someone else has modified the object in the mean time, and your copy is now stale.

While this is optional in most relational databases (for example, if you use a rowversion column in SQL Server), document versioning is a non-negotiable feature of CouchDB: to update a document, you have to know its ID and it’s revision number.

The easiest way of keeping track of this would simply be to add a _rev property to all your domain entities:

public class Book{    public string Title { get; set; }    public string Author { get; set; }    public string Publisher { get; set; }    public string _rev { get; set; } // hmm}

Actually though, I’d prefer not to worry about revision numbers at all. They are only required by CouchDB, and I don’t want to let such concerns leak into my domain model. Instead, document revision numbers should be stored in a separate Identity Map — a list of references to all hydrated database objects currently active in my application.

Requirement #4: LINQ expressions for Map/Reduce functions

CouchDB uses map/reduce functions expressed in javascript to define its views. For example:

map: function(book) {    emit(doc.NumberOfPages, doc);  }}reduce: function(keys, values) {   return sum(values);}

This seems like a pretty good fit for LINQ expressions, no? Perhaps something like:

public BooksByNumberOfPages : IMapReduce<Book, int>{    public IEnumerable<int> Map(IEnumerable<Book> books)    {        return books.select(b => b.NumberOfPages);    }    public int Reduce(IEnumerable<int> keys, IEnumerable<Books> values)    {         return keys.Sum();    }}

This is not strictly required, and indeed probably wouldn’t get used that much because CouchDB views are very static, but could be a nice bit of syntactic sugar.

Requirement #5: ID generators like NHibernate

CouchDB encourages you to provide your own IDs for documents — otherwise it will give you big fat ugly GUIDs. I would like to see ID generators similar to NHibernate’s hi/lo algorithm, which produces human-readable and approximately-sequential IDs.

Final words

From these ideas, clearly revision tracking with an identity map is the most important — in fact this was the reason I decided not to use CouchDB for a recent project to which it was well suited (because I didn’t have time to write a persistence framework for it first). Long term, however, CouchDB has a lot to offer the .NET community, and combined with officially-maintained ports for Windows, a well-supported CouchDB .NET library could bring benefit to many projects.