Wanted: A kickass .NET API for CouchDB

CouchDB logo

Over the past couple of weeks I’ve been playing with Apache CouchDB. If you haven’t seen it yet, you should check it out — it’s a document database that stores data as a flat collection of JSON objects (aka documents) with no schema, no normalization, no indexes, and no JOINs.

Communication is via RESTful HTTP commands — for example:

GET http://127.0.0.1:5984/books/42

…for retrieving a document ID ’42′ from a database named ‘books’. This would return the following JSON document:

{
  "_id":"some_doc_id",
  "_rev":"946B7D1C",
  "Title":"Harry Potter",
  "Author":{
    "Initials":"JK",
    "Surname":"Rowling"
  },
  "Pages":"436",
  "Tags":["wizards", "children", "magic"]
}

It sounds pretty crazy if you’re used to big relational databases, but it’s an important shift that is becoming more and more popular with the likes of Google BigTable and Amazon SimpleDB. The fact is, if you’re not writing a transactional application where normalization is a priority, do you really need all the complexity of a relational database? Many common web applications like blogs and CMSs and Twitter could work quite happily with databases like CouchDB. Check out these two great articles that explain it further:

Anyway, CouchDB is still quite young, and hasn’t seen a lot of attention outside the Ruby and Python communities. This sucks. I would like to see a well-designed open-source library for .NET applications using CouchDB for persistence. Here are some features such a library might provide over and above raw HTTP WebRequests:

Requirement #1: Generics and automatic serialization/deserialization of .NET objects

At time of writing, I can only find one CouchDB available for .NET. It’s called SharpCouch, and only works at the string level:

string json = db.GetDocument("127.0.0.1", "books", "42");

Instead, it shouldn’t be too hard to use something like JSON.NET and generics to automatically map between JSON strings and live .NET objects:

Book book = db.GetDocument<Book>("127.0.0.1", "books", "42");

Requirement #2: mapping RESTful HTTP error codes to .NET exceptions

Being a REST app, CouchDB uses HTTP status codes to indicate different states of success or failure, e.g. 404 when a document doesn’t exist. Such errors should be mapped from raw WebExceptions into nicer types like DocumentNotFoundException and DocumentConflictException.

Requirement #3: Implicit Offline Optmistic Lock via document revisions

Offline Optimistic Lock is a pattern used to prevent concurrency problems when two people edit the same piece of data at the same time. Basically, instead of locking an object while one person uses it, you allow multiple users access at the same time, but each user checks to see if it has changed before they modify it.

Optimistic Offline Lock is usually implemented via versioning. As well as an ID, each object has a version number that gets automatically incremented each time the object is modified. Before a client saves changes to the object, it checks to see if the version number has changed since it retrieved it. If it has, then you know someone else has modified the object in the mean time, and your copy is now stale.

While this is optional in most relational databases (for example, if you use a rowversion column in SQL Server), document versioning is a non-negotiable feature of CouchDB: to update a document, you have to know its ID and it’s revision number.

The easiest way of keeping track of this would simply be to add a _rev property to all your domain entities:

public class Book
{
    public string Title { get; set; }
    public string Author { get; set; }
    public string Publisher { get; set; }
    public string _rev { get; set; } // hmm
}

Actually though, I’d prefer not to worry about revision numbers at all. They are only required by CouchDB, and I don’t want to let such concerns leak into my domain model. Instead, document revision numbers should be stored in a separate Identity Map — a list of references to all hydrated database objects currently active in my application.

Requirement #4: LINQ expressions for Map/Reduce functions

CouchDB uses map/reduce functions expressed in javascript to define its views. For example:

map: function(book) {
    emit(doc.NumberOfPages, doc);
  }
}
reduce: function(keys, values) {
   return sum(values);
}

This seems like a pretty good fit for LINQ expressions, no? Perhaps something like:

public BooksByNumberOfPages : IMapReduce<Book, int>
{
    public IEnumerable<int> Map(IEnumerable<Book> books)
    {
        return books.select(b => b.NumberOfPages);
    }

    public int Reduce(IEnumerable<int> keys, IEnumerable<Books> values)
    {
         return keys.Sum();
    }
}

This is not strictly required, and indeed probably wouldn’t get used that much because CouchDB views are very static, but could be a nice bit of syntactic sugar.

Requirement #5: ID generators like NHibernate

CouchDB encourages you to provide your own IDs for documents — otherwise it will give you big fat ugly GUIDs. I would like to see ID generators similar to NHibernate’s hi/lo algorithm, which produces human-readable and approximately-sequential IDs.

Final words

From these ideas, clearly revision tracking with an identity map is the most important — in fact this was the reason I decided not to use CouchDB for a recent project to which it was well suited (because I didn’t have time to write a persistence framework for it first). Long term, however, CouchDB has a lot to offer the .NET community, and combined with officially-maintained ports for Windows, a well-supported CouchDB .NET library could bring benefit to many projects.

May 2, 2009

12 Comments

Kazi Manzur Rashid on May 3, 2009 at 2:02 am.

Great Idea.

Kane on May 3, 2009 at 8:53 pm.

In past projects I’ve used Lucene as a document database. It’s fast and supports searching (with a little work).

Wojo on May 5, 2009 at 10:19 am.

This is a great idea. I’m tempted to start this myself as I’ve been watching CouchDB very closely and we are fully committed to .NET.

Man, you really got my gears spinning. Maybe I’ll start an open source library for this in my free time :)

Dale Ragan on August 12, 2009 at 8:06 am.

I came across your post today and thought you might like to know of the open source project that I created that will be inline with your requirements. It is aptly named Ottoman to coincide with the CouchDB mantra. I just started it a couple of weeks ago, but I have a good jump on it now. You can find the source here: http://github.com/sinesignal/ottoman/tree/master

I didn’t like the SharpCouch library either, with the same reasons you gave. After not finding anything else, I decided to start this project. Let me know what you think.

Richard on August 12, 2009 at 9:52 am.

@Dale: awesome! Keen to see what you come up with :)

Dale Ragan on August 12, 2009 at 11:51 am.

@Richard: Great, I will keep you updated via twitter or through your blog.

alex pedenko on August 20, 2009 at 6:25 am.

take a look at divan – http://github.com/whenrik/Divan/tree/master. we use it in bistro. it’s pretty awesome.

Göran Krampe on September 10, 2009 at 1:47 am.

Cool! So we have someone outside of Foretagsplatsen using Divan? I am the developer of Divan and I would like to hear more about what you think of it.

regards, Göran

Göran Krampe on September 10, 2009 at 2:18 am.

Some more comments now that I actually read the article:
#0: I think the design of Divan is quite nice. :) I have tried to keep it KISS but still offer full queries in a reasonable way (using fluent style).

#1: Divan uses generics but we have so far stayed away from automatic serialization. Divan uses JSON.Net though and we gladly add serialization support if someone hacks it up. At Foretagsplatsen we wanted full control over JSON to do tricks etc, we were sick of “magic”.

#2: We do that, but perhaps we missed some.

#3: We do that, since we follow “CouchDB style”.

#4: We have not yet implemented a C# view server – but I did implement one in Smalltalk and it is not that hard to do. BUT… when you look at it in practice those few lines of js that you actually use is possibly not worth the effort. But hey, if someone whips it up – it would be complete separate from Divan anyway.

#5: Haven’t bothered.

Regarding a separate map, I did consider it – and it should not be hard to offer that as well – but we didn’t go that route.

Regarding writing a CouchDB library, Divan is out there on github and it is MIT and we gladly welcome committer, patches and ideas.

One thing I personally value is KISS. I am not a hard core C#-dude so I probably wrote the library the “vanilla style” :)

Also, soon I will push a more advanced example – because Divan actually supports all query options, attachments, etags (!) and lots of other details that the Trivial example doesn’t show.

Göran Krampe on September 22, 2009 at 1:38 am.

Just wanted to mention that I just pushed Couchdb-Lucene (first stab) integration and LINQ support (although limited) from Alex Pedenko.

regards, Göran

Kris on October 26, 2010 at 1:57 am.

Relax (.net CouchDB API) by Alex Robson
blog here:
http://sharplearningcurve.com/blog/?tag=/CouchDB

source:
http://github.com/arobson/Relax

enjoy!

hats shopping on April 1, 2011 at 1:56 pm.

Regarding writing a CouchDB library, Divan is out there on github and it is MIT and we gladly welcome committer, patches and ideas.

Leave a Reply