Turning a case sensitive string into a non-case sensitive string

Here’s a trick I recently picked up when dealing with document databases. Say you need to save objects that have IDs that only differ by case, but you’re using a document DB like Raven where keys are not case sensitive. In Google Books for example, oT7wAAAAIAAJ is an article in Spanish from a Brazilian journal, but OT7WAAAAIAAJ is a book about ghosts. RavenDB would not be able to recognize that these are two different IDs — so attempting to store them would result in a single document that gets overwritten each time. What can you do?

If it were the other way around — database is case sensitive, app is not — simply discarding the case information by converting everything to a common lowercase representation (a lossy transformation) would do the trick.

Our situation is a bit harder, however. We somehow need to represent the key as a string including each letter and also store whether it was uppercase or not. You could write a custom converter for this (maybe using special escape characters to indicate uppercase letters)… but a much easier way would be simply to convert it to Base32.

Why Base32? Using Base64 would produce shorter strings (more efficient encoding), but but it encodes data using both upper and lower case characters, so you are still at risk of collisions. Base32 on the other hand only uses uppercase, so it is safe to use for a case-insensitive key-value store.

Hex would work too (only uppercase characters from A-F), but it would need even more space to do so.

Mini book review: Recipes With Backbone

Disclaimer: this is my first-ever book review on this blog! I’m not a javascript/HTML developer by trade so it’s not going to be a critical review or anything — just how I found reading it.

I’ve worked on a few web applications over the years, and I’m familiar with at least a couple of other MVC-style frameworks, but it’s all been .NET, and recently, all on the desktop. Only in the past few months have I attempted my first full-on browser-based javascript application, using the fantastic Backbone.js. The journey so far has been pretty bumpy, but I’m finally getting to the point where I’m comfortable enough with Backbone (and the javascript language itself) to really start being productive. But recently I’ve been starting to hit some complex situations that I’m not sure how to implement, and am struggling to find online. As well, I’m always on the look out for tips confirming the code I’ve already written is on the right track.

I stumbled across Recipes With Backbone book by chance from a blog post by one of the authors, where he was discussing how to implement default routes. I was a bit hesitant at first when I couldn’t find many reviews of it, but ‘advanced Backbone’ is a pretty new subject area (even to find on blogs) and it was cheap so I bought a copy. I am glad to say I was very satisfied with my purchase.

The book has 152 pages, and it took me a day to read on my Kindle. It covers:

  1. Writing Client Side Apps (Without Backbone)
  2. Writing Backbone Applications
  3. Namespacing
  4. Organizing with Require.js
  5. View Templates with Underscore.js
  6. Instantiated View
  7. Collection View (Excerpt)
  8. View Signature
  9. Fill-In Rendering
  10. Actions and Animations
  11. Reduced Models and Collections
  12. Non-REST Models
  13. Changes Feed (Excerpt)
  14. Pagination and Search
  15. Constructor Route
  16. Router Redirection
  17. Evented Routers
  18. Object References in Backbone
  19. Custom Events
  20. Testing with Jasmine

The book is based around a hypothetical online calendar application, analogous to Google Calendar. I use GCal a lot so it was nice to have familiar examples — some books use made-up applications that are unfamiliar or too vague (even a blog or a todo list could be designed in many different ways). But when examples are based based off a specific real-world application like GCal or Twitter then you already know how it’s supposed to behave and it simply becomes a matter of mapping that behaviour to the code examples on the page.

The book is structured as a series of enhancements to the calendar — starting with a plain jQuery $.getJSON() on a server-generated HTML page, just like we used to write in 2008. This was a good familiar starting point for me, before leaping into a basic Backbone structure and then refactoring it and adding more advanced behaviour.

The twenty chapters are each presented in a problem: solution format. They are very brief but I really liked this — they are clear, succinct, get straight to the point, and are very readable. Code is explained in 3-4 line chunks at a time, which fit well on my Kindle (which has a small screen and no colors). Overall I thought it was very good value for time spent reading.

For me, the book was valuable because:

  • It corrected some things I thought I already knew 🙂 like $(el) injection
  • It showed patterns for dealing with things like view deactivation and dangling event references (as a WPF developer this stuff gives me nightmares)
  • It showed how simply and elegantly difficult things like paging could be implemented
  • It suggested some neat things I hadn’t even thought about yet e.g. constructor routing

The book also includes a chapter on Require.js, which it presents early on as a foundation — chapter four — right after namespacing. I’m a fan of require.js and have no doubt it will soon become a standard requirement for any serious JS development in future. But right now it’s not widely supported and — in my experience — can cause a lot more problems than it solves. Unless you’re totally comfortable accepting the fact you’ll probably have to hack a lot of third party jQuery plugins just to make them work, I would not recommend require.js to beginners until the rest of the community catches up.

Regardless, Recipes With Backbone is still the single best text I’ve read on Backbone (aside from the Backbone docs themselves), and I’m looking forward to seeing what else these guys write. I would definitely recommend it to anyone wanting to take their Backbone skills to the next level. Go check it out and read the sample chapters here: http://recipeswithbackbone.com

Returning camelCase JSON from ASP.NET Web API

Loving ASP.NET Web API but not loving the .NET-centric PascalCase JSON it produces?

// a .NET class like this...public class Book{    public int NumberOfPages { get; set; }    public string Author { get; set; }}
// ... should be serialized into JSON like this{    "numberOfPages": 42,    "author": "JK Rowling"}

Luckily this is quite easy to fix with a custom formatter. I started with Henrik Nielsen’s custom Json.NET formatter and changed it slightly to use a camelCase resolver and also indent the JSON output (just for developers; we would turn this off once the app was deployed). You can grab the code here: https://gist.github.com/2012642

Then just swap out the default JSON formatter in your global.asax.cs:

var config = GlobalConfiguration.Configuration;// Replace the default JsonFormatter with our custom onevar index = config.Formatters.IndexOf(config.Formatters.JsonFormatter);config.Formatters[index] = new JsonCamelCaseFormatter();

RavenDB Includes much simpler than you think

Here’s something I’ve been struggling to get my head around over the past few days as I’ve been getting deeper into RavenDB. The example usage from their help page on pre-fetching related documents:

var order = session.Include<Order>(x => x.CustomerId)    .Load("orders/1234"); // this will not require querying the server!var cust = session.Load<Customer>(order.CustomerId);

That doesn’t look too difficult at first glance – it looks pretty similar to futures in NHibernate, which I’ve used plenty of times before. But hang on. The first line instructs RavenDB to preload a second object behind the scenes, but how does it know it specifically to be a Customer?

session.Include<Order>(x => x.CustomerId)

At first I thought there should be a Customer type argument somewhere. Something like this:

// WRONGvar order = session.Include<Customer>(...)    .Load<Order>(...)

Otherwise how can RavenDB know I want a Customer and not some other type? Maybe the Order.CustomerID property is actually stored in RavenDB as sort of strongly-typed weak reference to another object? Maybe the order returned is some sort of runtime proxy object with referencing metadata built-in?

No no no. It is much simpler than that. Let’s take a step back.

In a traditional SQL database, you need both the type (table name) and ID to locate an object. The ID alone is not enough, because two differently typed objects may have the same ID (but in different tables). So you need the type as well.

In RavenDB, you only need the ID. This is possible because RavenDB does not have the concept of types – under the covers it’s effectively just one big key-value store of JSON strings which get deserialized by the client into different CLR types. Even though the RavenDB .NET client is strongly typed (Customers and Orders), the server has no awareness of the different types stored within it.*

This is what makes includes work. Raven doesn’t need to know the type of the document being included, it’s just another chunk of JSON (which could be deserialized as anything). And the ID can only point to exactly one document because all documents are stored in the same single key-value store. So, back to the original example:

// This line instructs RavenDB to preload the JSON document// that has an ID == x.CustomerIdvar order = session.Include<Order>(x => x.CustomerId)    .Load("orders/1234"); // This line accesses the previously-loaded JSON document and// deserializes it as a Customer.var cust = session.Load<Customer>(order.CustomerId);

That makes much more sense now.

* actually RavenDB does keep metadata about the CLR type, but for unrelated purposes.

Yet another reason to love REST

There are a lot of reasons why you should love REST. It’s fast, simple, stateless, and easy to debug. This makes it absolutely fantastic to test against.

REST APIs get you great end-to-end test coverage

Line for line, an end-to-end system test covers a lot more code than a deep down class-level unit test. They also more closely simulate a user’s actions, providing a realistic path of possible scenarios that need to be verified. So for the most bang for your buck, you want to be testing at the outermost level possible.

The ultimate level here is testing through UI automation — simulating clicks and looking for responses on screen, as a real user would do. However, although UI automation libraries are improving, UI tests still tend to be very complex to write and are often brittle to things like positioning/layout changes. A public API provides a fantastic alternative ‘hooking’ point of entry where application behaviour can be invoked from external code, without involving the UI but still typically mirroring fields and actions on screen pretty closely (if well-designed in order to keep your clients fast and lightweight).

So testing against an API avoids the complexity of UI testing but still give the system a thorough end-to-end workout.

Tests written against REST APIs aren’t brittle

Although the back-end behind your API may change considerably, a public API must continue to work in the exact same way or you’ll break all your clients and have a lot of very angry users. So as well as covering more lines of code for less effort, tests written against a public API won’t be as brittle and need updating as often as ‘subcutaneous’ ones testing against internal code structures.

Everyone can connect to a REST API

If you had to choose a protocol for interacting with your application, you couldn’t find a much simpler one than REST. Stateless HTTP, verbs, status codes and JSON — 99% of the time you need nothing but a browser to debug it, and your tests can be written in a completely different language than the API back-end and still be completely understandable.

Compare this to previous ‘standard’ HTTP protocols that were often anything-but understandable – even with things like service discovery and WSDLs (designed to help developers), debugging SOAP XML mismatches between a Java Metro client and a .NET WCF service is one personal hell I hope never to have to relive.

Not just for big APIs

Remember you don’t need a formal public API like Twitter or Facebook to realise the testing benefits mentioned here. Pretty much any sort of REST endpoint will do. The JSON handlers powering your single-page web app would be a great entry point to get started testing underlying behaviour, for example.

Fast empty Raven DB sandbox databases for unit tests

Fast empty Raven DB sandbox databases for unit tests

Say you you have some NUnit/xUnit/Mspec tests that require a live Raven DB instance. Specifically:

  • You do not want your test to be affected by any existing documents, so ideally the Raven DB database would be completely empty.
  • Your test may span multiple document sessions, so doing it all within a single transaction and rolling it back is not an option.
  • You want your tests to run fast.

What are your options?

Raven DB currently has no DROP DATABASE or equivalent command. The recommended method is simply to delete Raven DB’s ServerData or ServerTenants directories, but this requires restarting the Raven DB service (expensive). Also any live document stores may throw an exception at this point.


One option that Raven DB makes very cheap, however, is spinning up new database instances (aka tenants). In fact all you need to do is specify a new DefaultDatabase and the document store will spin a new database up for you. For example:

var store = new DocumentStore    {        Url = "http://localhost:8080",        DefaultDatabase = "MyAppTests-" + DateTime.Now.Ticks    };store.Initialize();// now you have an empty database!

Pretty easy huh? Here’s a little test helper I wrote to help manage these sandbox databases, stores and sessions. Here’s how you would use it in a tenant-per-fixture test:

[TestFixture]public class When_doing_something{    [TestFixtureSetUp]    public void SetUp()    {        RavenDB.SpinUpNewDatabase();        using (var session = RavenDB.OpenSession())        {            // insert test data        }    }    [Test]    public void It_should_foo()    {        using (var session = RavenDB.OpenSession())        {            // run tests        }    }}

You can grab it here as a gist on Github: https://gist.github.com/1940759.

Note that if you use this method, a number of sandbox databases will (of course) build up over time. You can clean these up you by simply deleting the Raven DB data directories. (See gist for an example batch file you can throw in your source control to do this.)

Object-oriented basics: single object or collection scope?

Here is a contrived example of a common SOLID violation you might see. Can you spot it?

class Mp3Encoder : IMp3Encoder{    public void Encode(IEnumerable<string> wavFiles)    {        foreach (var wavFile in wavFiles)        {            var outputFile = /* create output file */;                        while (/* blocks remaining... */)            {                var buffer = /* read block */;                var encoded = /* encode wav block as MP3 */;                /* write block to output file */;            }            /* write ID3 trailing header */;        }    }}

Except in trivially simple cases, there should always be a class boundary when shifting context from coordinating a collection versus performing actions on a single object.

The class above is violating this rule — it knows how to perform collection-level responsibilities as well as single-object responsibilities. It needs to be broken into two classes; one for encoding a single file and one for coordinating the group.

This rule is a form of the Single Responsibility Principle. For example:

Collection-scoped class responsibilities
  • Coordinating ‘before all’ and ‘after all’ actions
  • Looping through items
  • Maintaining shared state (counting, accumulating etc)
Single object-scoped class responsibilities
  • Coordinating ‘before each’ and ‘after each’ actions
  • Performing actions on item

If you ignore this collection-vs-single-object contextual boundary, your classes will become messes of nested procedural code — especially when different behaviour is required for each item in the collection. Your classes will be that much harder to unit test, and you won’t easily be able to re-use them in single-object scenarios.

Subscribing to NuGet package updates via RSS

Update – check out nugetfeed.org for a more polished way to subscribe to NuGet package updates in your RSS reader!

Just a quick tip I found today – If you’re a NuGet package author and want to be notified when updates are published for upstream packages you depend, you can do so by subscribing to an OData query in an RSS reader.

For example, in order to keep protobuf-net-data in sync with the latest protobuf-net, I need to publish a new package rebuilt against the latest protobuf-net data every time they release a new version. For this I subscribed to the following URL in Google Reader:

http://packages.nuget.org/v1/FeedService.svc/Packages()?$filter=Id eq ‘protobuf-net’

Matt Wrock has a few more advanced examples of this using ifttt.com to orchestrate sending emails and tracking package downloads etc.

Protocol Buffers DataReader Extensions for .NET

.NET, as a mostly-statically typed language, has a lot of really good options for serializing statically-typed objects. Protocol Buffers, MessagePack, JSON, BSON, XML, SOAP, and the BCL’s own proprietary binary serialization are all great for CLR objects, where the fields can be determined at runtime.

However, for data that is tabular in nature, there aren’t so many options. In my past two jobs I’ve had a need to serialize data:

  • That is tabular – not necessarily CLR DTOs.
  • Where the schema is unknown before it is deserialized – each data set can have totally different columns.
  • In a way that is streamable, so entire entire data sets do not have to be buffered in memory at once.
  • That can be as large as hundreds of thousands of rows/columns.
  • In a reasonably performant manner.
  • In a way that could potentially be read by different platforms.
  • Into as small a number of bytes as possible.

Protocol Buffers DataReader Extensions for .NET was born out of these needs. It’s powered by Marc Gravell’s excellent Google Protocol Buffers library, protobuf-net, and it packs data faster and smaller than the equivalent DataTable.Save/Write XML:

Usage is very easy. Serializing a data reader to a stream:

DataTable dt = ...;using (Stream stream = File.OpenWrite("C:foo.dat"))using (IDataReader reader = dt.CreateDataReader()){    DataSerializer.Serialize(stream, reader);}

Loading a data table from a stream:

DataTable dt = new DataTable();using (Stream stream = File.OpenRead("C:foo.dat"))using (IDataReader reader = DataSerializer.Deserialize(stream)){    dt.Load(reader);}

It works with IDataReaders, DataTables and DataSets (even nested DataTables). You can download the protobuf-net-data from NuGet, or grab the source from the GitHub project page.

Failing the tube test

As a Londoner working in the city, I (along with hundreds of thousands of others) catch the tube to work every day. Although you’re typically crammed in a carriage with hundreds of other commuters, the journey is a solitary one, and many passengers turn to their iPhones, iPads and Kindles to pass the time.


For me, it’s a pretty long journey, up to an hour each way to get from Parsons Green to Canary Wharf — a large portion of which is spent deep underground, with no mobile or 3G coverage on my iPhone 4. During this time you can really tell which apps’ data strategies have been properly thought out and designed, and which ones have been hacked together in a hurry.

I’m not going to mention any offending games or apps by name, but these are the sort of things developers should be shot for:

Apps that don’t cache downloaded data for later viewing

Except in special cases (e.g. streaming audio/video, realtime or sensitive content), all downloaded data must be cached locally, and able to be viewed again without an internet connection. Apps that don’t store downloaded content are just crippled web browsers.

Offline App Store reviews

It is unacceptable to nag me to rate your app in the App Store when there is no internet connection. Honestly — don’t ask users to do the impossible.

Repeated connection attempts and repeated error diaglogs

It is unacceptable to repeatedly show error dialogs when there is no internet connection. By all means, try to continuously download updates, but do it silently and in the background. I do not need to be told every 30 seconds that it (still) couldn’t connect.

Locking the UI when checking for updates on startup

When launched, It is unacceptable to block the user in the UI from reading previously-cached data while downloading updated content. By all means, check for updates on start up, but while that is happening, the user must be able to read the previous cached copy, unless they explicitly asked for a refresh from-scratch.

Throwing away cached data then failing to update

It is unacceptable to leave the user with a blank screen because you threw away previously-cached data THEN failed to update. Local data caches should not be cleared until new data has been 100% retrieved successfully and loaded.

Lost form data, please retype

t is unacceptable to ‘lose’ form values and force the user to retype a message when a submit error occurs. Form values must be saved somewhere — either automatically requeued for resubmission, or as a draft that can be edited or cancelled by the user.

Ad priorities

It is unacceptable for an ad-supported app to crash, refuse to start or behave weirdly because it couldn’t connect to the ad server. Are you worried about people using your app, or seeing ads?

Occasionally connected apps

It is unacceptable for any app to refuse to start without an internet connection. Users should always be able to change settings, view and edit previously-cached data and enqueue new requests where possible (to be submitted when a connection is available).

Final thoughts

In general, occasionally-connected computing is a good model for iPhone/Android apps that push/pull data from the Internet. It just needs to be implemented correctly. Like web designers assuming no javascript and using progressive-enhancement to add improved behaviour on top of basic functionality, mobile developers should assume no internet connection is available most of the time — jump at the opportunity to sync when possible, but don’t interrupt the user or assume it won’t fail.

So please, if If you’re a mobile developer, think about us poor Londoners before you deploy your next version!