Turning a case sensitive string into a non-case sensitive string

Here’s a trick I recently picked up when dealing with document databases. Say you need to save objects that have IDs that only differ by case, but you’re using a document DB like Raven where keys are not case sensitive. In Google Books for example, oT7wAAAAIAAJ is an article in Spanish from a Brazilian journal, but OT7WAAAAIAAJ is a book about ghosts. RavenDB would not be able to recognize that these are two different IDs — so attempting to store them would result in a single document that gets overwritten each time. What can you do?

If it were the other way around — database is case sensitive, app is not — simply discarding the case information by converting everything to a common lowercase representation (a lossy transformation) would do the trick.

Our situation is a bit harder, however. We somehow need to represent the key as a string including each letter and also store whether it was uppercase or not. You could write a custom converter for this (maybe using special escape characters to indicate uppercase letters)… but a much easier way would be simply to convert it to Base32.

Why Base32? Using Base64 would produce shorter strings (more efficient encoding), but but it encodes data using both upper and lower case characters, so you are still at risk of collisions. Base32 on the other hand only uses uppercase, so it is safe to use for a case-insensitive key-value store.

Hex would work too (only uppercase characters from A-F), but it would need even more space to do so.

April 2, 2012

Leave a Reply