Wanted: A kickass .NET API for CouchDB

Wanted: A kickass .NET API for CouchDB

CouchDB logo

Over the past couple of weeks I’ve been playing with Apache CouchDB. If you haven’t seen it yet, you should check it out — it’s a document database that stores data as a flat collection of JSON objects (aka documents) with no schema, no normalization, no indexes, and no JOINs.

Communication is via RESTful HTTP commands — for example:

GET http://127.0.0.1:5984/books/42

…for retrieving a document ID ’42′ from a database named ‘books’. This would return the following JSON document:

{  "_id":"some_doc_id",  "_rev":"946B7D1C",  "Title":"Harry Potter",  "Author":{    "Initials":"JK",    "Surname":"Rowling"  },  "Pages":"436",  "Tags":["wizards", "children", "magic"]}

It sounds pretty crazy if you’re used to big relational databases, but it’s an important shift that is becoming more and more popular with the likes of Google BigTable and Amazon SimpleDB. The fact is, if you’re not writing a transactional application where normalization is a priority, do you really need all the complexity of a relational database? Many common web applications like blogs and CMSs and Twitter could work quite happily with databases like CouchDB. Check out these two great articles that explain it further:

  • Assaf Arkin: CouchDB: Thinking beyond the RDBMS
  • Infoworld: Slacker databases break all the old rules

Anyway, CouchDB is still quite young, and hasn’t seen a lot of attention outside the Ruby and Python communities. This sucks. I would like to see a well-designed open-source library for .NET applications using CouchDB for persistence. Here are some features such a library might provide over and above raw HTTP WebRequests:

Requirement #1: Generics and automatic serialization/deserialization of .NET objects

At time of writing, I can only find one CouchDB available for .NET. It’s called SharpCouch, and only works at the string level:

string json = db.GetDocument("127.0.0.1", "books", "42");

Instead, it shouldn’t be too hard to use something like JSON.NET and generics to automatically map between JSON strings and live .NET objects:

Book book = db.GetDocument<Book>("127.0.0.1", "books", "42");

Requirement #2: mapping RESTful HTTP error codes to .NET exceptions

Being a REST app, CouchDB uses HTTP status codes to indicate different states of success or failure, e.g. 404 when a document doesn’t exist. Such errors should be mapped from raw WebExceptions into nicer types like DocumentNotFoundException and DocumentConflictException.

Requirement #3: Implicit Offline Optmistic Lock via document revisions

Offline Optimistic Lock is a pattern used to prevent concurrency problems when two people edit the same piece of data at the same time. Basically, instead of locking an object while one person uses it, you allow multiple users access at the same time, but each user checks to see if it has changed before they modify it.

Optimistic Offline Lock is usually implemented via versioning. As well as an ID, each object has a version number that gets automatically incremented each time the object is modified. Before a client saves changes to the object, it checks to see if the version number has changed since it retrieved it. If it has, then you know someone else has modified the object in the mean time, and your copy is now stale.

While this is optional in most relational databases (for example, if you use a rowversion column in SQL Server), document versioning is a non-negotiable feature of CouchDB: to update a document, you have to know its ID and it’s revision number.

The easiest way of keeping track of this would simply be to add a _rev property to all your domain entities:

public class Book{    public string Title { get; set; }    public string Author { get; set; }    public string Publisher { get; set; }    public string _rev { get; set; } // hmm}

Actually though, I’d prefer not to worry about revision numbers at all. They are only required by CouchDB, and I don’t want to let such concerns leak into my domain model. Instead, document revision numbers should be stored in a separate Identity Map — a list of references to all hydrated database objects currently active in my application.

Requirement #4: LINQ expressions for Map/Reduce functions

CouchDB uses map/reduce functions expressed in javascript to define its views. For example:

map: function(book) {    emit(doc.NumberOfPages, doc);  }}reduce: function(keys, values) {   return sum(values);}

This seems like a pretty good fit for LINQ expressions, no? Perhaps something like:

public BooksByNumberOfPages : IMapReduce<Book, int>{    public IEnumerable<int> Map(IEnumerable<Book> books)    {        return books.select(b => b.NumberOfPages);    }    public int Reduce(IEnumerable<int> keys, IEnumerable<Books> values)    {         return keys.Sum();    }}

This is not strictly required, and indeed probably wouldn’t get used that much because CouchDB views are very static, but could be a nice bit of syntactic sugar.

Requirement #5: ID generators like NHibernate

CouchDB encourages you to provide your own IDs for documents — otherwise it will give you big fat ugly GUIDs. I would like to see ID generators similar to NHibernate’s hi/lo algorithm, which produces human-readable and approximately-sequential IDs.

Final words

From these ideas, clearly revision tracking with an identity map is the most important — in fact this was the reason I decided not to use CouchDB for a recent project to which it was well suited (because I didn’t have time to write a persistence framework for it first). Long term, however, CouchDB has a lot to offer the .NET community, and combined with officially-maintained ports for Windows, a well-supported CouchDB .NET library could bring benefit to many projects.

Spotted in the wild…

Spotted in the wild…

CREATE     FUNCTION [dbo].[fnGetInitialDate] (       @Month               int = null,        @Year         int = null,       @CurrentDate datetime)  RETURNS intAS  BEGIN       Declare @ReportDate datetime       Declare @InitialMonth      int           if @month IS null              set @month = Month(Dateadd(m, -1, @CurrentDate))       if @Year IS null              set @Year = Year(Dateadd(m, -1, @CurrentDate))-- NORMALISE Date/*       Select               @Date = StartDateTime,              @DateDiff = AgeInMonths       from              (                     select top 1                           StartDateTime,                           AgeInMonths                     from                           dbo.<censored>                     where                            StartDateTime is not null              ) TA*/       set @ReportDate      = Cast(@Year AS VARCHAR(4)) + '-' +  Right('00' + Cast(@month AS VARCHAR(2)),2) + '-01'       set @InitialMonth = datediff(m, @CurrentDate, @ReportDate)       return @InitialMonthEND

This SQL Server function features:

  • A name that doesn’t tell you anything ✔
  • Comments indicating non-existent functionality ✔
  • Big sections commented out with no explanation ✔
  • Using strings for date arithmetic ✔

After much thought I concluded that this code is functionally equivalent to DATEDIFF(MONTH, GETDATE(), @ReportDate), except you can choose a different year or month for the current time.

This function is still at large on a production database server (names have been suppressed to protect the innocent). Developers are recommended not to approach it as it is considered dangerous, and extremely confusing.

IUnityContainerAccessor must use a static container instance

IUnityContainerAccessor must use a static container instance

I found an interesting problem this morning when my ASP.NET MVC application mysteriously broke after adding an HttpModule in the web.config. Here’s the problem code:

public class MvcApplication : HttpApplication, IUnityContainerAccessor{    IUnityContainer container;        public IUnityContainer Container    {        get { return container; }    }    public void Application_Start()    {        this.container = new UnityContainer();        // this.container.RegisterTypes etc    }

The container was being configured fine in Application_Start, but then UnityControllerFactory would throw “The container seems to be unavailable in your HttpApplication subclass” exceptions every time you tried to load a page — this.container was somehow null again.

After doing a little digging and finding this article where someone had the same problem with Winsdor, it seems ASP.NET will create multiple HttpApplication instances when parallel requests are received. However, Application_Start only gets called once, so anything you would like to share between multiple instances (e.g. your IoC container) must be static:

public class MvcApplication : HttpApplication, IUnityContainerAccessor{    static IUnityContainer container; // good    IUnityContainer container; // bad, not shared between HttpApplication instances        public IUnityContainer Container    {        get { return container; }    }

Troubleshooting jQuery.getJSON() and Google’s GData API

Troubleshooting jQuery.getJSON() and Google’s GData API

I’ve just been playing around with calling the Google Books API via jQuery. Here’s some tips in case you run into trouble with “Access to restricted URI denied” or “invalid label” errors:

  • The JSONP parameter is callback=? (jQuery automagically fills in the rest).
  • You need to specify the alt parameter as json-in-script, not just json.

Here is an example of a full AJAX call to the Google Book Search API:

$.getJSON("http://books.google.com/books/feeds/volumes?alt=json-in-script&callback=?",    { q: "the great gatsby" },    function(data, textStatus) {        $.each(data.feed.entry, function(i, entry) {            // do stuff with each result        });    }});

Remove content from files with Powershell

Remove content from files with Powershell

I love Windows PowerShell. It’s the missing < a> shell/scripting environment that could never be properly satisfied with just DOS commands and batch files. I know it’s been out for a while now, but I’ve only recently started getting on top of it, and it’s been fantastic so far.

Quite often I find myself needing to do little maintenance jobs involving iterating through files and tranforming them — C# source code, SQL scripts, my MP3 collection etc. Powershell is the perfect tool for these sorts of tasks. Here’s a very simple (and probably not that efficient) snippet I wrote last week to strip out SQL Server Management Studio’s useless ‘descriptive headers’ comments from a nested hierarchy of 2,000 generated SQL scripts:

/****** Object:  Table [dbo].[hibernate_unique_key]    Script Date: 04/12/2009 11:35:29 ******/CREATE TABLE [dbo].[hibernate_unique_key](        [next_hi] [int] NULL) ON [PRIMARY]GO
dir -recurse -filter *.sql $src | foreach ($_) {        $file = $_.fullname        echo $file        (get-content $file) | where {$_ -notmatch "^s?/****** Object:.*$" } | out-file $file}

How easy was that? I definitely need to learn more of this stuff!

DDD and C# language semantics

This morning I was reading an article about properties vs attributes by Eric Lippert, one of the designers of the C# language. In it, he explained:

Properties and fields and interfaces and classes and whatnot are part of the model; each one of those things should represent something in the model world. If a property of a “rule” is its “description” then there should be something in the model that you’re implementing which represents this. We have invented properties specifically to model the “an x has the property y” relationship, so use them.

That’s not at all what attributes are for. Think about a typical usage of attributes:

[Obsolete][Serializable]public class Giraffe : Animal{ ...

Attributes typically do not have anything to do with the semantics of the thing being modeled. Attributes are facts about the mechanisms – the classes and fields and formal parameters and whatnot. Clearly this does not mean “a giraffe has an obsolete and a serializable.” This also does not mean that giraffes are obsolete or serializable. That doesn’t make any sense. This says that the class named Giraffe is obsolete and the class named Giraffe is serializable.

Now he doesn’t explicitly mention domain-driven design, but I think it’s an important point for .NET developers who’re trying to get into it: attributes are only for implementation details – don’t use them for domain model concepts.

Keep track of your Internet browsing at work with 8aweek

According to a survey conducted last year, the average employee fritters away eight hours a week on non work-related activities. Personal Internet browsing was cited as the number one culprit; I know I’m a little bit guilty of it, too. Leaving my personal e-mail open all day (and stopping to read new messages as they arrive) is something particularly I’m particularly bad at.

This week I discovered 8aweek, a handy Firefox plugin designed to help you reclaim some of this time, by monitoring non-work related browsing. 8aweek installs as a toolbar (which, thankfully supports FireFox customization so you can move the buttons around), and maintains a no-registration-required profile on their website. You set up a bunch of restricted sites that you want to limit access to, and a daily quota for the total amount of time you can spend browsing them (the default is half an hour).

When you exceed this quota, you get a nice warning message in the middle of your browser window:

Your restricted browsing time is up

Whether you obey it’s restrictions, or just keep clicking “10 more minutes”, is another question, but at least being made aware of the issue is helpful. After all, guilt is a great motivator.

I’ve been using it for a couple of days and I already feel like I’ve been more productive at work. Plus, it’s nice to be able to keep on top of this stuff yourself, rather than having limits imposed on you by your employer. 8aweek also gives you a nice visualisation of your browsing habits, and even grades you on your behaviour (I’m all Cs at the moment).

Domain entities vs presentation model objects, part 2: mapping

Domain entities vs presentation model objects, part 2: mapping

This is the second half of a two-part article. Read the first half here: Domain entities vs presentation model objects.

In my last post, I wrote about the difference between domain entities and presentation model objects. Remember my two task classes — the transactional domain entity and the UI presentation object? They’re very similar, and this could lead to a lot of ugly hand-written plumbing code mapping fields on one to the other.

// Task domain entity.public class Task{    public int Id;    public string Name;    public DateTime? DueDate;    // ...etc.}// Task presentation model object.public class TaskView{    public int Id;    public string Name;    public string DueDate;    public bool IsUnscheduled;    public bool IsOverDue;    public long SortIndex;}

Instead, I’m using an open-source .NET library called AutoMapper by Jimmy Bogard — an object-object mapper (OOM) that sets values from one type to another.

Setting up a map from one type to another is dead simple — AutoMapper will automatically match fields with the same name. For other stuff we use lambda expressions, or delegate to another class. Here’s what my Task-to-TaskView mapping looks like:

// Set up a map between Task and TaskView. Note fields with the same names are// mapped automagically!Mapper.CreateMap<Task, TaskView>()    .ForMember(dest => dest.DueDate, opt => opt.AddFormatter<DueDateFormatter>())    .ForMember(dest => dest.SortIndex, opt => opt.ResolveUsing<SortIndexResolver>());

That was easy! Note I’m using a custom value formatter for the DueDate:

public class DueDateFormatter : IValueFormatter{    public string FormatValue(ResolutionContext context)    {        DateTime? d = context.SourceValue as DateTime?;        if (d.HasValue)            return d.Value.ToString("dddd MMM d");        else            return "Anytime";    }}

…and a custom resolver for the sort index (an integer derived from the task’s due date). Note that, while a formatter transforms one field by itself; a resolver examines the whole object to derive a value:

public class SortIndexResolver : IValueResolver{    public ResolutionResult Resolve(ResolutionResult source)    {        Task t = source.Value as Task;        DateTime sortDate = t.DueDate.HasValue ?            t.DueDate.Value : DateTime.MaxValue;        long sortIndex =             Convert.ToInt64(new TimeSpan(sortDate.Ticks).TotalSeconds);        return new ResolutionResult(sortIndex);    }}

With dedicated classes for formatting and resolving values, tests become very easy to write (although I did write my own ShouldFormatValueAs() test helper extension method):

[TestFixture]public class When_displaying_a_due_date{    [Test]    public void Should_display_null_values_as_anytime()    {         new DueDateFormatter().ShouldFormatValueAs<DateTime?>(null, "Anytime");    }    [Test]    public void Should_format_date()    {        new DueDateFormatter().ShouldFormatValueAs<DateTime?>(                new DateTime(2009, 02, 28), "Saturday Feb 28");    }}

Putting it to practice, it becomes a one-liner to create a new TaskView instance given a Task.

// Grab a Task from the repository, and map it to a new TaskView instance.Task task = this.tasks.GetById(...);TaskView taskView = Mapper.Map<Task, TaskView>(task);

Awesome!

Domain entities vs presentation model objects

Domain entities vs presentation model objects

This is the first half of a two-part article. Read the second half here: Domain entities vs presentation model objects, part 2: mapping.

When putting domain driven design (DDD) principles to practice in a real application, it’s important to recognize the difference between your domain model and presentation model.

Here’s a simple real-life example from a little MVC application that displays a to-do list of tasks. The task domain entity is very simple, with an identifer, a name and an optional due date:

// task domain entitypublic class Task{    public int Id;    public string Name;    public DateTime? DueDate;    // ...etc}

Pretty much everything we want to know about this Task can be derived from its properties via Specifications. For example, an OverDueTaskSpecification will tell us whether or not the task is overdue by checking if the due date has already passed, and an UnscheduledTaskSpecification will tell us if the task is scheduled by checking if the due date is null.

However, when rendering the task to the user, the application’s view must remain dumb — a passive view — and cannot work this sort of stuff out for itself. It is not enough to simply pass a collection of domain entities to the view; all relevant details must be explicitly provided so the view has all the information it requires without having to do any work itself.

Together, all these UI-specific details form a presentation model object, which is effectively identical to a DTO in that it has only carries data and no methods or behaviour of its own (some people call them PTOs). Here’s what my application’s presentation model object for a task looks like:

// task presentation model objectpublic class TaskView{    public int Id;    public string Name;    public string DueDate;    public bool IsUnscheduled;    public bool IsOverDue;    public long SortIndex;}

The ID and name fields are exactly the same, but DueDate is now a string which will either hold a friendly-formatted date or ‘Anytime’ if the task is not scheduled. Unscheduled and overdue are now explicit flags so the view can immediately identify tasks that need special display like highlighting.

Most of the front-end user interaction in this application is implemented via JavaScript, so we need an index for sorting and appending tasks in the correct order. It needs to be simple to parse and fast to compare, so I chose a long integer which is resolved from the due date.

Note this is a very simple example with only one domain entity class which we just added some new properties to. Most real scenarios will require some degree of flattening aggregate object graphs into a single class, and hiding of fields which are not relevant.

In the big scheme of things, presentation model objects are defined in the application layer (see the Onion Architecture), where they are validated and mapped between domain entities. All public interfaces to the application layer are expressed in terms of presentation model objects, so domain entities do not leak into the UI layer.

In an ASP.NET MVC application for example, presentation model objects would be used as the Models for all views and action results.

Update: a friend asked me total why I don’t format DueDate in the view, as proper MVC separation of concerns would dictate. The reason is that as well as rendering them directly in the view when a page is requested, this application also sends presentation model objects to the client via AJAX. I decided that having one single place (instead of two) where tasks’ DueDates are formatted would make maintenance easier in the long run.

How to Lose Traction on a Personal Software Project

So you’ve started writing your first program — great stuff! The software development can be very tricky, especially if you are not absolutely sure what you are doing. In this blog we are exploring different software development methodologies. Here are a few tips and traps to watch out for along the way.

Long periods of time when your program doesn’t compile and/or run at all

You enthusiastically start work on some big changes (e.g. re-architecting your application), but stop because you hit a brick wall, or don’t have the time to finish it. The source code is left in a crippled and unfinished state. You can’t do anything with any of your code until this is fixed, and the longer you leave it, the more you’ll forget and the harder it will be to get started again.

In Continuous Integration this is called a broken build, and is a big no-no because it impacts other peoples ability to work. Without the pressure of a team environment pushing you forward, having a roadblock like this in your path makes it very easy to lose faith and motivation.

Make massive changes on a whim without revision control or a backup

There’s nothing worse than changing your mind about the design of something after you’ve already thrown away the old one. When you do something that involves changing or deleting existing code, be sure to make some sort of backup in case things don’t work out.

If you haven’t taken the plunge with revision control yet, I highly recommend looking at some of the free SVN or GIT hosting services out there — once you get into it, you’ll never look back.

Ignore the YAGNI principle and focus on the fun/easy bits

Focusing on things like validation, eye candy or even general purpose ‘utility’ functions is a great way to build up a large complex code base that doesn’t do anything useful yet. Focus on the core functionality of your software first — your main features should be complete before you start thinking about nice-to-haves.

Throw your code away and start from scratch

As Netscape famously discovered a few years ago, throwing away existing code to start afresh is almost never a good idea. Resist the urge and make a series of small, manageable code refactorings instead.

Start programming before you have a clear goal in mind

Instead of a command line tool, maybe my application would be better as a simple GUI application? Or, I was originally writing my homebrew game for my old Xbox360, but now that we’ve bought a Wii, I’ll port it to that instead!

What are you actually trying to achieve? Spend some time with a pen and some paper coming up with a really clear vision of what you’re trying to create — e.g. screen mock-ups. If you don’t know what you’re writing from the start, the goal posts will keep moving as you change your mind, and you’ll have no chance of finishing it.

Get carried away with project hype before you’ve actually got anything to show for yourself

Spending hours trying to think of the perfect name for your software, designing an icon, choosing the perfect open-source license and making a website won’t get you any closer to having a working application. Get something up and running first, and worry about telling people about it later.

Start a million new features and don’t finish any of them

Jumping from one idea to another without finishing anything is like spinning a car’s wheels and not going anywhere. Make a list of features your program should have, and put them in order of most-to-least important. Work on them, one-at-a-time, in that order.