The road to automated database deployment

The road to automated database deployment

Working in an agile team means delivering software early and often — several times a day even. The only way to achieve this without going crazy doing releases all the time is to eliminate all the manual steps in your release process, until you’re leave with a single mouse-click to get your code into to production.

One of the biggest challenges here is how to handle your app’s back-end database. Application files can be deleted and restored in seconds, websites rolled and and replaced a thousand times, but the database is a living, breathing beast, closely guarded by a separate team of DBAs who usually won’t even let you see what state its in, let alone make changes. If you screw up a database script, it could cost hours of downtime restoring from a backup. So how do you automate changes then? Here’s how we did it on a recent project.

Pre-requisite #1: portable script shipping

I’ll never forget the deployment process on my first major commercial project. During development, everyone shared a ‘working copy’ database, directly adding columns and tables via GUI tools. On release day, a designated person would open up SQL Compare, connect to the dev database, connect to the live database, and generate a script copying all the changes from dev to production.

Needless to say, this was a fairly sub-optimal solution. Here’s some of the problems we developers incurred on a regular basis:

  • The diff scripts only work between specific database versions. If UAT was on a faster development cycle than PROD, we couldn’t use the same scripts between them.
  • Only the final diff script was put in source control, so if the development database crashed or was accidentally wiped in between release cycles, we lost all our changes.
  • SQL Compare can reverse-engineer schema changes, but it can’t reverse-engineer data changes/migration between tables. It’s not even a complete solution.

The last one was the real nail in the coffin for us. If you already have to script data changes by hand… why not go the whole hog and do ALTER/CREATE statements too? That way you can just check in each change as a separate sql script into source control — numbered, so you know which order they go in. No more generating scripts, no more losing our work in the dev database. No more SQL Compare.

Pre-requisite #2: database version tracking table

Given a live database and a stack of SQL scripts, how do you know which ones need to be run? You could start reading going through them, statement by statement, until you find a change that doesn’t appear to be applied and start there — or you could simply have each script write a row to the end of a changelog table like this:

ALTER TABLE ...;DROP TABLE ...;-- other DDL changesINSERT INTO CHANGELOG (16, '0016_customer_info.sql', GETDATE(), USER);

That way, it becomes trivial to find out what scripts have run/need to be run against a database. And putting the special INSERT at the end of each change script ensures it doesn’t add a row unless all previous statements executed successfully.

Pre-requisite #3: database version awareness

When is increasing coupling in your application a good thing? When it’s between the database schema version, and the application running over the top. Your physical data model and application’s persistence model (ORM mappings etc) have to move in lockstep, so it’s a good idea to refuse to start if the database version isn’t correct.

You can do this quite easily by checking the highest-run script number in the CHANGELOG table, or introducing a separate VERSION table. Whatever the strategy used, a loud error on startup is much easier to diagnose than messy runtime errors when a column is missing (fail-fast).

Pre-requisite #4: script privileges

To make our lives easier, we introduced a new rule: all change scripts must be able to be run under the same database user/privileges as the application itself. This means we can use the application’s own connection string to run scripts, and eliminates the need for DBAs or sharing the database administrator password.

Running scripts with the application account means we have to grant our services rights to create and modify tables, views, stored procedures etc (i.e. Oracle’s RESOURCE or MS SQL’s db_ddladmin role). Before you freak out about applications running with elevated privileges, please remember one, this is not a shared database — it’s for exclusive use of our application, there is only one user — and two, this application does not run on desktops, so there is no risk of users uncovering credentials and logging in themselves.

This is the same self-upgrading database model WordPress uses, and we justified that if it’s good enough for the 12 million WordPress sites out there on the greater Internet, then it’s good enough for one little Windows Service running in a locked-down server network.

Anyway. If a script does requires system privileges to run, it probably means it has something to do with storage or security — e.g. storage files or table partitioning — that isn’t actually required for the application to run and thus doesn’t need to be part of the official database version history. Those scripts can thus be run independently of the automatic-upgrade process, as required by DBAs.

Pre-requisite #5: clean up existing installations

We have a number of copies of our applications running in the wild, in various states of disrepair. Scripts that were never run, or ran with ‘continue on error’ enabled, and ad hoc customizations by DBAs, and missing entries from the CHANGELOG table all needed to be smoothed over to make a solid platform before any automated process can take over. In this case, we did it by hand (ironically using SQL Compare), but depending on the number of installations out there, and effort required, you may wish to incorporate these clean up steps in your automated scripts.

End goal: self-upgrading application

From this point, it’s relatively easy to implement an automated upgrade process that checks the CHANGELOG table to find out which scripts haven’t been run yet, and run them. That way it works against any previous database version. Our upgrader tool also performs a number of prepatory and clean-up steps, such as creating the CHANGELOG table if required, and cleaning up any vestigal remains of the previous database deployment process.

This process can be designed, and unit tested, and audited to a greater degree than any human process could ever achieve. You can run it hundreds of times over to tweak it, and test it against all sorts of different database configurations with minimal effort. It can and should become part of your continuous integration and automated build and tests.

Failure and backups

Unfortunately, even with all the extra confidence you get from scripting, there is always a risk of something going wrong on release day. One part of the manual process we’re missing is backups, so we’re currently looking into to add an automated backup step before kicking off (probably a lightweight one, using something like a Snapshot in MS SQL or Flashback in Oracle). That way, we can eventually aim to provide automatic rollback capabilities in the event a script fails.

Final words

I’m not saying that the ours is the only way of achieveing automated database deployment, or that this is neccessarily the best way. We’re still working out the kinks, but I’d say it’s looking pretty good so far.

Stick to the paradigm (even if it sucks)

Stick to the paradigm (even if it sucks)

Today I had the pleasure of fixing a bug in an unashamedly-procedural ASP.NET application, comprised almost entirely of static methods with anywhere from 5-20 parameters each (yuck yuck yuck).

After locating the bug and devising a fix, I hit a wall. I needed some additional information that wasn’t in scope. There were three places I could get it from:

  • The database
  • Form variables
  • The query string

The problem was knowing which to choose, because it depended on the lifecycle of the page — is it the first hit, a reopened item, or a post back? Of course that information wasn’t in scope either.

As I contemplated how to figure the state of the page using HttpContext.Current, my spidey sense told me to stop and reconsider.

Let’s go right back to basics. How did we use to manage this problem in procedural languages like C? There is a simple rule — always pass everything you need into the function. Global variables and hidden state may be convenient in the short-term, but only serve to confuse later on.

To fix the problem, I had to forget about “this page” as an object instance. I had to forget about private class state. I had to forget that C# was an object-oriented language. Those concepts were totally incompatible with this procedural code base, and any implementation would likely result in a big mess.

In fact, it turns out to avoid a DRY violation working out the page state again, and adding hidden dependencies on HttpContext, the most elegant solution was simply to add an extra parameter to every method in the stack. So wrong from a OO/.NET standpoint, but so right for procedural paradigm in place.

Architectural examples: how does your app fit together?

Architectural examples: how does your app fit together?

I had the pleasure of hearing a talk by Stewart Baird today, and I liked one of his ideas so much I thought I’d mention it here.

Object-relational mappers, IoC, ESBs, REST etc are all a big deal when you’re introducing them to teams or working with contractors who haven’t used them before.

To explain how it all fits together, you should create an architectural example — a reference system that implements a single, simple behaviour. To be effective, it should:

  • Be end-to-end. It should demonstrate how events and data flow from the user interface through to the database and back, including any and all services in between.
  • Be checked into source control. Publish it alongside the production code so everyone can see it.
  • Lead your production code. When you introduce a new component — e.g. a distributed cache — add it to the architectural example first, before rolling it out to your main system.

It’s always nice if you can explore and see everything working for yourself. An architectural example can help explain new topics to less-experienced developers, and provide a nice reference of how you want to see things implemented in future.

IUnityContainerAccessor must use a static container instance

IUnityContainerAccessor must use a static container instance

I found an interesting problem this morning when my ASP.NET MVC application mysteriously broke after adding an HttpModule in the web.config. Here’s the problem code:

public class MvcApplication : HttpApplication, IUnityContainerAccessor{    IUnityContainer container;        public IUnityContainer Container    {        get { return container; }    }    public void Application_Start()    {        this.container = new UnityContainer();        // this.container.RegisterTypes etc    }

The container was being configured fine in Application_Start, but then UnityControllerFactory would throw “The container seems to be unavailable in your HttpApplication subclass” exceptions every time you tried to load a page — this.container was somehow null again.

After doing a little digging and finding this article where someone had the same problem with Winsdor, it seems ASP.NET will create multiple HttpApplication instances when parallel requests are received. However, Application_Start only gets called once, so anything you would like to share between multiple instances (e.g. your IoC container) must be static:

public class MvcApplication : HttpApplication, IUnityContainerAccessor{    static IUnityContainer container; // good    IUnityContainer container; // bad, not shared between HttpApplication instances        public IUnityContainer Container    {        get { return container; }    }

Domain entities vs presentation model objects

Domain entities vs presentation model objects

This is the first half of a two-part article. Read the second half here: Domain entities vs presentation model objects, part 2: mapping.

When putting domain driven design (DDD) principles to practice in a real application, it’s important to recognize the difference between your domain model and presentation model.

Here’s a simple real-life example from a little MVC application that displays a to-do list of tasks. The task domain entity is very simple, with an identifer, a name and an optional due date:

// task domain entitypublic class Task{    public int Id;    public string Name;    public DateTime? DueDate;    // ...etc}

Pretty much everything we want to know about this Task can be derived from its properties via Specifications. For example, an OverDueTaskSpecification will tell us whether or not the task is overdue by checking if the due date has already passed, and an UnscheduledTaskSpecification will tell us if the task is scheduled by checking if the due date is null.

However, when rendering the task to the user, the application’s view must remain dumb — a passive view — and cannot work this sort of stuff out for itself. It is not enough to simply pass a collection of domain entities to the view; all relevant details must be explicitly provided so the view has all the information it requires without having to do any work itself.

Together, all these UI-specific details form a presentation model object, which is effectively identical to a DTO in that it has only carries data and no methods or behaviour of its own (some people call them PTOs). Here’s what my application’s presentation model object for a task looks like:

// task presentation model objectpublic class TaskView{    public int Id;    public string Name;    public string DueDate;    public bool IsUnscheduled;    public bool IsOverDue;    public long SortIndex;}

The ID and name fields are exactly the same, but DueDate is now a string which will either hold a friendly-formatted date or ‘Anytime’ if the task is not scheduled. Unscheduled and overdue are now explicit flags so the view can immediately identify tasks that need special display like highlighting.

Most of the front-end user interaction in this application is implemented via JavaScript, so we need an index for sorting and appending tasks in the correct order. It needs to be simple to parse and fast to compare, so I chose a long integer which is resolved from the due date.

Note this is a very simple example with only one domain entity class which we just added some new properties to. Most real scenarios will require some degree of flattening aggregate object graphs into a single class, and hiding of fields which are not relevant.

In the big scheme of things, presentation model objects are defined in the application layer (see the Onion Architecture), where they are validated and mapped between domain entities. All public interfaces to the application layer are expressed in terms of presentation model objects, so domain entities do not leak into the UI layer.

In an ASP.NET MVC application for example, presentation model objects would be used as the Models for all views and action results.

Update: a friend asked me total why I don’t format DueDate in the view, as proper MVC separation of concerns would dictate. The reason is that as well as rendering them directly in the view when a page is requested, this application also sends presentation model objects to the client via AJAX. I decided that having one single place (instead of two) where tasks’ DueDates are formatted would make maintenance easier in the long run.

Back to basics: exception handling in .NET

Back to basics: exception handling in .NET

Following on from my article on good source control check-in habits, I’ve got a few tips I’d like to share on exceptions in .NET. These are basically all responses to things I’ve seen done in production code before.

Write a method that throws — don’t return success/failure codes

Here’s a C function I wrote a couple of years ago for an application a couple of years back that listens for messages via UDP:

int bind_udp_server(struct udp_server * server){        server->socket_descriptor = socket(AF_INET, SOCK_DGRAM, 0);        if (-1 == server->socket_descriptor)                return -1; /* bail out */                if (!set_non_blocking(server->socket_descriptor))                return -1;  /* bail out */                ...                if (-1 == bind(server->socket_descriptor,                        (struct sockaddr *) &server_addr,                        sizeof(server_addr)))                return -1; /* bail out */                return 0; /* socket bound successfully */}

C has no formal mechanism for functions to return to their caller when exceptional behaviour is encountered. Instead they typically return an integer status code (e.g. 1 or 0) to indicate success or failure.

Clients using these functions must check for errors after every call. Because of this:

  • Your real application code can be much harder to follow when it’s hidden amongst all the error handling code (which probably only executes in rare circumstances anyway).
  • If you accidentally omit a return code check and something fails, your program will continue happily executing as if nothing happened. Problems might not become apparent until several operations later, making it much harder to track down the original issue.
  • Because you have to check for errors the whole way (and at different levels), there will be a lot of duplicated code, violating the DRY principle.
  • Error codes can overlap with legitimate return values, making it hard to indicate when an actual error has occurred. This is known as the semipredicate problem.
  • Sometimes things happen that are so catastrophic, you don’t even bother with a strategy for trying to cope with them. For example, it might be perfectly acceptable to die if malloc() fails, unless you’re fully equipped to keep your program running when the machine is out of memory.

Languages like .NET are free from these worries because they use a different strategy: assume everything will always succeed (try), and handle any problems later in one single location (catch).

public void Foo(){    try    {        DoSomething();        DoSomethingElse();        DoThirdThing();    }    catch    {        // bail out    }}

This lets you focus on the success case (the one that actually matters), instead of cluttering your code up with error handling.

Unfortunately, it seems a lot of .NET developers only think of exceptions as things you need to catch when calling the Base Class Library, and shy away from throwing and catching their own exceptions. Instead, I see kludges like:

if (!DoSomething())   // bail out

Or even this one, which simulates Exception.Message:

string errorMessage = DoSomething();if (errorMessage != "")   // bail out

The golden rule is, if your method encounters a condition that prevents it from achieving its intended purpose, it should throw an exception.

Re-throwing exceptions

Do you know the difference between the two following code snippets?

catch (Exception ex){        // log the exception        ...                // re-throw        throw ex;}catch (Exception ex){        // log the exception        ...                // re-throw        throw;}

The first example resets ex’s stack trace to the current location. This is fine when you’re throwing a brand new exception, but not so good if you’re just passing one along — you’ll lose the stack trace telling you where it originated from.

To preserve the stack trace, just use “throw” by itself with no argument, as in the second example.

Use parameterless catch/throw if you don’t care about the details

You don’t have to specify any parameter name in your catch block if you don’t need the exception object itself:

catch (SqlException){        // rollback transaction        // rethrow        throw;}

This is handy for eliminating that “variable ‘ex’ is not used” compiler warning. In fact, if you don’t care about the type, you can get rid of the brackets altogether:

catch // anything{        // handle the exception}

But be careful with one — in 99% of situations, we do care what the type is.

Only catch exceptions you can handle

Imagine you are working on some code that tries to parse some user input as an integer. If it’s invalid (e.g. not a number), we’ll display an error message.

int value = 0;try{        // forget about TryParse() for the moment :)        value = Int32.Parse(someUserInput);}catch (Exception ex){        // invalid input}

What’s wrong with this? We’ve specified that this catch block is capable of handling any type of exception, no matter how severe or unrelated. They’ll all be interpreted as validation errors — OutOfMemoryException, ThreadAbortException — even NullReferenceException (which might indicate an actual bug in the code).

In reality, however, all we really care about is a very narrow subset of exceptions: those that indicate the number couldn’t be parsed correctly.

int value = 0;try{        value = Int32.Parse(someUserInput);}catch (FormatException ex){        // invalid input}

Only catch exceptions you anticipate could be thrown as a consequence of the code within your try block. Don’t try to handle anything outside that window, unless you have a specific strategy to deal with it.

Multiple catch blocks

Here’s an example of some code I saw at my last job, that did different things depending on the type of exception:

try{        ...}catch (Exception ex){        if (ex is FileNotFoundException)                // do stuff        else if (ex is IOException)                // do stuff        else                // ???}

Yuck — using reflection and a big if-statement to differentiate types is generally a sign of bad code in .NET (or any OO language), and catch blocks are no exception (ba-dum-psh). Instead, use multiple catch blocks with overloads for different exception types. At runtime, the .NET framework will call the first matching catch block, so you need to specify them in most-to-least specific order:

try{        ...}catch (FileNotFoundException ex){        // file not found (subclass of IOException)}catch (IOException ex){        // some other file-related error}

Inner exceptions

In .NET, low-level exceptions can be wrapped up into a higher-level context that makes more sense in the grand scheme of things. For example, a CouldNotOpenDocumentException might be caused by a FileNotFoundException. This is reflected in the Exception.InnerException property, and inner exception each is complete with its own error message and stack trace.

Here’s some code I saw once for unwrapping them, via a loop:

try{        ...}catch (Exception ex){        // get exception details        string errMessage = ex.Message + ex.StackTrace;        Exception innerException = ex.InnerException;        while (innerException != null)        {                errMessage += innerException.StackTrace;                innerException = innerException.InnerException;        }        // log errMessage}

This code is pretty redundant (and ugly), as the .NET framework will do this for you. Exception.ToString() will print out the exception’s message and stack trace, then call itself on the inner exception:

try{        ...}catch (Exception ex){        // get full stack trace        string errMessage = ex.ToString();        // log errMessage}

This will return a complete dump of the inner exception tree, producing a message like:

Could not get OSM Status menu item status. ---> System.NullReferenceException: Object reference not set to an instance of an object.   at SMS.Common.GetOSMMenuStatus(Space oSpace) in D:DevXYZSourceXYZ.WebsiteCodeCommonCommon.cs:line 1051   --- End of inner exception stack trace ---   at SMS.Common.GetOSMMenuStatus(Space oSpace) in D:DevXYZSourceXYZ.WebsiteCodeCommonCommon.cs:line 1058   at SMS.Common.AppendSMSNav(Space& oSpace, DateTime& dtSMSDate, DateTime& dtStartDate, DateTime& dtEndDate) in D:DevXYZSourceXYZ.WebsiteCodeCommonCommon.cs:line 837

Another tip for is finding the root inner exception — the original cause. The developers from the previous example chose to drill through InnerExceptions in a loop until they reached the bottom (null). An easier way would just be to call Exception.GetBaseException().

Defensive coding

To write robust and idiot-proof code, you have to assume people are going to try to break it. This means checking input parameters are valid at the start of every method.

You should continue this habit all throughout internal application methods as well, so bad data gets stopped short at the point of origin, instead of trickling through and causing problems later on.

public class ProjectsController : Controller{    IProjectRepository projectRepository;        public ProjectsController(IProjectRepository projectRepository)    {        // better to have an exception here, when the controller is constructed...        if (projectRepository == null)            throw new ArgumentNullException("projectRepository");        this.projectRepository = projectRepository;    }    public ActionResult Detail(string name)    {        // ...than down here, when an action gets called.        Project p = this.projectRepository.GetByName(name);        return View(p);    }    ...}

Note that Microsoft’s Spec# contracts will make these sorts of checks much easier in future, with built-in syntax for not-nullable parameters:

public ProjectsController(IProjectRepository! projectRepository){    // method will not be entered if projectRepository is null    this.projectRepository = projectRepository;}

Plus these rules are enforced at compile time as well! So the following line would not build:

ProjectsController controller = new ProjectsController(null); // error

See Microsoft’s article on Best Practices for Handling Exceptions for more .NET exception tips.

St9exception with libstdc++

St9exception with libstdc++

Here’s something I encountered today when writing some C++:

try{    throw std::runtime_error("some message");}catch (std::exception e){    std::cout << "error: " << e.what() << std::endl;}

When run, this code will write “error: St9exception”, instead of “some message” to stdout. “St9exception” comes from libstdc++, in which the default value returned by std::exception::what() is the mangled symbol name. The mistake was that I was catching the exception by value, not by reference. (Too much C# perhaps?)

Instead it should have of course been:

try{    throw std::runtime_error("some message");}catch (const std::exception & e){    std::cout << "error: " << e.what() << std::endl;