Back to basics: exception handling in .NET

Back to basics: exception handling in .NET

Following on from my article on good source control check-in habits, I’ve got a few tips I’d like to share on exceptions in .NET. These are basically all responses to things I’ve seen done in production code before.

Write a method that throws — don’t return success/failure codes

Here’s a C function I wrote a couple of years ago for an application a couple of years back that listens for messages via UDP:

int bind_udp_server(struct udp_server * server){        server->socket_descriptor = socket(AF_INET, SOCK_DGRAM, 0);        if (-1 == server->socket_descriptor)                return -1; /* bail out */                if (!set_non_blocking(server->socket_descriptor))                return -1;  /* bail out */                ...                if (-1 == bind(server->socket_descriptor,                        (struct sockaddr *) &server_addr,                        sizeof(server_addr)))                return -1; /* bail out */                return 0; /* socket bound successfully */}

C has no formal mechanism for functions to return to their caller when exceptional behaviour is encountered. Instead they typically return an integer status code (e.g. 1 or 0) to indicate success or failure.

Clients using these functions must check for errors after every call. Because of this:

  • Your real application code can be much harder to follow when it’s hidden amongst all the error handling code (which probably only executes in rare circumstances anyway).
  • If you accidentally omit a return code check and something fails, your program will continue happily executing as if nothing happened. Problems might not become apparent until several operations later, making it much harder to track down the original issue.
  • Because you have to check for errors the whole way (and at different levels), there will be a lot of duplicated code, violating the DRY principle.
  • Error codes can overlap with legitimate return values, making it hard to indicate when an actual error has occurred. This is known as the semipredicate problem.
  • Sometimes things happen that are so catastrophic, you don’t even bother with a strategy for trying to cope with them. For example, it might be perfectly acceptable to die if malloc() fails, unless you’re fully equipped to keep your program running when the machine is out of memory.

Languages like .NET are free from these worries because they use a different strategy: assume everything will always succeed (try), and handle any problems later in one single location (catch).

public void Foo(){    try    {        DoSomething();        DoSomethingElse();        DoThirdThing();    }    catch    {        // bail out    }}

This lets you focus on the success case (the one that actually matters), instead of cluttering your code up with error handling.

Unfortunately, it seems a lot of .NET developers only think of exceptions as things you need to catch when calling the Base Class Library, and shy away from throwing and catching their own exceptions. Instead, I see kludges like:

if (!DoSomething())   // bail out

Or even this one, which simulates Exception.Message:

string errorMessage = DoSomething();if (errorMessage != "")   // bail out

The golden rule is, if your method encounters a condition that prevents it from achieving its intended purpose, it should throw an exception.

Re-throwing exceptions

Do you know the difference between the two following code snippets?

catch (Exception ex){        // log the exception        ...                // re-throw        throw ex;}catch (Exception ex){        // log the exception        ...                // re-throw        throw;}

The first example resets ex’s stack trace to the current location. This is fine when you’re throwing a brand new exception, but not so good if you’re just passing one along — you’ll lose the stack trace telling you where it originated from.

To preserve the stack trace, just use “throw” by itself with no argument, as in the second example.

Use parameterless catch/throw if you don’t care about the details

You don’t have to specify any parameter name in your catch block if you don’t need the exception object itself:

catch (SqlException){        // rollback transaction        // rethrow        throw;}

This is handy for eliminating that “variable ‘ex’ is not used” compiler warning. In fact, if you don’t care about the type, you can get rid of the brackets altogether:

catch // anything{        // handle the exception}

But be careful with one — in 99% of situations, we do care what the type is.

Only catch exceptions you can handle

Imagine you are working on some code that tries to parse some user input as an integer. If it’s invalid (e.g. not a number), we’ll display an error message.

int value = 0;try{        // forget about TryParse() for the moment :)        value = Int32.Parse(someUserInput);}catch (Exception ex){        // invalid input}

What’s wrong with this? We’ve specified that this catch block is capable of handling any type of exception, no matter how severe or unrelated. They’ll all be interpreted as validation errors — OutOfMemoryException, ThreadAbortException — even NullReferenceException (which might indicate an actual bug in the code).

In reality, however, all we really care about is a very narrow subset of exceptions: those that indicate the number couldn’t be parsed correctly.

int value = 0;try{        value = Int32.Parse(someUserInput);}catch (FormatException ex){        // invalid input}

Only catch exceptions you anticipate could be thrown as a consequence of the code within your try block. Don’t try to handle anything outside that window, unless you have a specific strategy to deal with it.

Multiple catch blocks

Here’s an example of some code I saw at my last job, that did different things depending on the type of exception:

try{        ...}catch (Exception ex){        if (ex is FileNotFoundException)                // do stuff        else if (ex is IOException)                // do stuff        else                // ???}

Yuck — using reflection and a big if-statement to differentiate types is generally a sign of bad code in .NET (or any OO language), and catch blocks are no exception (ba-dum-psh). Instead, use multiple catch blocks with overloads for different exception types. At runtime, the .NET framework will call the first matching catch block, so you need to specify them in most-to-least specific order:

try{        ...}catch (FileNotFoundException ex){        // file not found (subclass of IOException)}catch (IOException ex){        // some other file-related error}

Inner exceptions

In .NET, low-level exceptions can be wrapped up into a higher-level context that makes more sense in the grand scheme of things. For example, a CouldNotOpenDocumentException might be caused by a FileNotFoundException. This is reflected in the Exception.InnerException property, and inner exception each is complete with its own error message and stack trace.

Here’s some code I saw once for unwrapping them, via a loop:

try{        ...}catch (Exception ex){        // get exception details        string errMessage = ex.Message + ex.StackTrace;        Exception innerException = ex.InnerException;        while (innerException != null)        {                errMessage += innerException.StackTrace;                innerException = innerException.InnerException;        }        // log errMessage}

This code is pretty redundant (and ugly), as the .NET framework will do this for you. Exception.ToString() will print out the exception’s message and stack trace, then call itself on the inner exception:

try{        ...}catch (Exception ex){        // get full stack trace        string errMessage = ex.ToString();        // log errMessage}

This will return a complete dump of the inner exception tree, producing a message like:

Could not get OSM Status menu item status. ---> System.NullReferenceException: Object reference not set to an instance of an object.   at SMS.Common.GetOSMMenuStatus(Space oSpace) in D:DevXYZSourceXYZ.WebsiteCodeCommonCommon.cs:line 1051   --- End of inner exception stack trace ---   at SMS.Common.GetOSMMenuStatus(Space oSpace) in D:DevXYZSourceXYZ.WebsiteCodeCommonCommon.cs:line 1058   at SMS.Common.AppendSMSNav(Space& oSpace, DateTime& dtSMSDate, DateTime& dtStartDate, DateTime& dtEndDate) in D:DevXYZSourceXYZ.WebsiteCodeCommonCommon.cs:line 837

Another tip for is finding the root inner exception — the original cause. The developers from the previous example chose to drill through InnerExceptions in a loop until they reached the bottom (null). An easier way would just be to call Exception.GetBaseException().

Defensive coding

To write robust and idiot-proof code, you have to assume people are going to try to break it. This means checking input parameters are valid at the start of every method.

You should continue this habit all throughout internal application methods as well, so bad data gets stopped short at the point of origin, instead of trickling through and causing problems later on.

public class ProjectsController : Controller{    IProjectRepository projectRepository;        public ProjectsController(IProjectRepository projectRepository)    {        // better to have an exception here, when the controller is constructed...        if (projectRepository == null)            throw new ArgumentNullException("projectRepository");        this.projectRepository = projectRepository;    }    public ActionResult Detail(string name)    {        // ...than down here, when an action gets called.        Project p = this.projectRepository.GetByName(name);        return View(p);    }    ...}

Note that Microsoft’s Spec# contracts will make these sorts of checks much easier in future, with built-in syntax for not-nullable parameters:

public ProjectsController(IProjectRepository! projectRepository){    // method will not be entered if projectRepository is null    this.projectRepository = projectRepository;}

Plus these rules are enforced at compile time as well! So the following line would not build:

ProjectsController controller = new ProjectsController(null); // error

See Microsoft’s article on Best Practices for Handling Exceptions for more .NET exception tips.

Find missing foreign/primary keys in SQL Server

Find missing foreign/primary keys in SQL Server

Last week I wrote a SQL query to estimate how many columns are missing from foreign or primary keys. This works because of our naming convention for database keys:

  • We use a Code suffix for natural keys e.g. CountryCode = NZ
  • We use an ID suffix for surrogate keys e.g. EmployeeID = 32491

This script looks for any columns that match this naming pattern, but aren’t part of a primary or foreign key relationship.

-- Find columns on tables with names like FooID or FooCode which should-- be part of primary or foreign keys, but aren't.SELECT AS [Table], AS [Column]FROM        sys.tables t        INNER JOIN sys.syscolumns c ON       = t.object_id                -- Join on foreign key columns        LEFT JOIN sys.foreign_key_columns fkc ON                (fkc.parent_object_id = t.object_id                AND c.colid = fkc.parent_column_id)                OR (fkc.referenced_object_id = t.object_id                AND c.colid = fkc.referenced_column_id)                -- Join on primary key columns        LEFT JOIN sys.indexes i ON                i.object_id = t.object_id                and i.is_primary_key = 1        LEFT JOIN sys.index_columns ic ON                ic.object_id = t.object_id                AND ic.index_id = i.index_id                AND ic.column_id = c.colidWHERE        t.is_ms_shipped = 0        AND ( LIKE '%ID' OR LIKE '%Code')        AND        (                fkc.constraint_object_id IS NULL -- Not part of a foreign key                 AND ic.object_id IS NULL -- Not part of a primary key        )        AND        (                -- Ignore some tables       != 'sysdiagrams'                AND NOT LIKE '[_]%' -- temp tables                AND NOT LIKE '%temp%'                AND NOT LIKE '%Log%' -- log tables                                -- Ignore some columns                AND NOT IN ('GLCode', 'EID', 'AID') -- external keys        )ORDER BY,

Using this script, I found over 200 missing foreign keys in one production database!

Back to basics: good source control check-in habits

Back to basics: good source control check-in habits

The other day at work I went over a few good source control habits (TFS-centric) for new developers, and why they’re worth doing. Here are some tips:

Check-ins are coarse-grained

When I first started using source control, a lot of my check-in logs looked like this:

  1. Added useful function foo() (that will be ultimately required for feature X)
  2. Added feature X

I was trying to be helpful by making foo() available for use as early as possible, but most of the time I checked it in too early, and needed to change/fix it later — resulting in additional check-ins. In reality though, all this did was clutter up the change history with signal-to-noise, making it harder to establish what changes were actually required to effect a change.

It also of course violates the YAGNI (you ain’t gonna need it) principle – no one ever needed foo() badly enough to make it worth checking it in early.

Check-ins should be coarsely grained — instead of staggering code changes over a period of time, save it all for one big changeset that contains everything. This actually improves YAGNI — instead of adding endless low-level functions ‘because they might be useful’, you’re forced to associate them with a higher application-level change.

Never check in unfinished work

In the main (trunk) branch, there are two cardinal rules:

  • Never check in anything that breaks the build
  • Never check in an unfinished feature

The main branch should always be in a state where it’s ready for release. Features have been entirely added or don’t exist yet; complete changes have been made, or not even started. There is no middle ground.

Having to worry about the possibly unfinished state of the ‘master copy’ is just another burden on your team. Even worse, if you check in code that doesn’t compile, you can interrupt other people’s work and waste their time commenting out bad blocks and working around it.

Sometimes, however, you’ll get a piece of work that doesn’t fit in one check-in.

  • It could take a long time to implement (needs more than one check-in).
  • Multiple developers might need to work on it.
  • The change isn’t going to be released (committed to the main branch) until a later date.

You have two options: fork a copy of the code for a new development branch, or shelve your changes so you or someone else can work on them later. The basic rule for branches is if you need more than one check-in to make a change, you need to branch and merge back in later.

Source control is not for back ups

This follows from the previous two tips. Check in your work because you’re completely done and finished — not because it’s the end of the day, or because it’s the weekend etc.

I think this habit often begins when using reserved-checkout source control systems like VSS, where having a file checked out explicitly blocks other people from editing it. Everyone who’s used VSS knows from experience that hilarious situation where you need to edit a checked-out file, but the person who has it is on holiday or otherwise unreachable.

VSS file exclusively checked out error

To avoid this, teams adapt their work habits to suit VSS, checking in frequently so as not to lock files for extended periods. This practice is continued for some reason (cargo cult programming?) when moving to less brain-damaged source control tools like TFS or subversion.

Anyway, if you want nightly backups of your local working source code, get some backup software. Don’t use source control — change tracking becomes difficult when you check in based on calendar events. It gets even more ridiculous if you have changes that don’t compile yet, and have to comment/uncomment code when checking it in.

Use a separate check-in when reformatting code

Say you go to edit a file, but the formatting is all messed up — the indenting is wrong. So you fix it up while making your changes. But then, if someone comes along later to see what your change entailed, a diff tool will highlight differences on every line, making it very hard to see what the actual code change was. Instead, use a separate check-in for mass code reformatting.

Write useful check-in comments

If you’re looking over the history of a file in source control, comments like ‘css fix’ or ‘#12345′ won’t be very useful. Instead, good check-in comments need three things:

  • What was changed
  • Why it was changed
  • Any bug/issue tracking numbers (if applicable)

Describe what you changed, and why it was necessary. You don’t have to write an essay — I find one sentence of ‘did X because Y’ is usually sufficient.

TFS change history

You should also list any bug/issue tracking numbers, or make reference to any relevant project documentation (e.g. list of requirements) to provide a higher-level context. This is a good way of establishing traceability on tightly-controlled source repositories — if you do it right, you’ll find almost every change can be traceable back to some external identifier.

I also like to paste my changeset number when I update an issue as being completed (ready for user testing or peer review) — the easier it is to find my changes, the better! (Note that some source control tools can do this automatically from your check-in comment).

Real-life examples

You can see good check-in habits in action in most popular open source projects. Here are a few of examples:

  • Boost C++ Libraries revision log
  • Ruby on Rails commit history
  • Subtext revision log

Best practice DDD/TDD ASP.NET MVC example applications

Best practice DDD/TDD ASP.NET MVC example applications

It’s no secret that I’m a big fan of the ASP.NET MVC Framework. One of the reasons I like it is because it’s opinionated – instead of leaving developers free to fumble things as they choose, it recognises that most projects will be more-or-less exactly the same, and imposes some conventions to keep everyone on the same path.

This is the same philosophy used by many other application frameworks, and it saves everyone time and confusion by providing well-thought-out standards that everyone’s familiar with. However, because ASP.NET MVC is a general purpose web framework, its conventions can only cover so much — guidance for other frameworks and patterns is left to the community.

One very important area (that inspired this article) is around object-relational mappers (ORM). If I’m using the ASP.NET MVC framework:

  • How do I reference a repository from controller?
  • What’s the best way of applying a per-http-request unit of work pattern?
  • What should it look like if I’m using NHibernate? LINQ-to-SQL? etc.

Thankfully, there are a few example applications available that demonstrate solutions to these questions (and many more):

Kym NHibernate, Rhino-Commons, Castle Windsor, NUnit, Rhino Mocks, jQuery, Json.NET
S#arp Architecture NHibernate, Ninject, MvcContrib, NUnit, Rhino Mocks, Json.NET
CarTrackr Linq to SQL, Unity, MSTest, Moq, Visifire Silverlight charts, OpenID
MVC Storefront LINQ to SQL, StructureMap, MSTest, Moq, NLog, CardSpace, Windows Workflow, OpenID
Suteki Shop LINQ to SQL, Castle Windsor, NUnit, Moq, log4net, jQuery

It’s really good to see some different opinions in implementation and design, across a range of frameworks. MVC Storefront is of particular note – everything is explained over a series of 20-or-so podcasts that include interviews with well-known .NET developers like Jeff Atwood (Coding Horror/StackOverflow), and Ayende (NHibernate).

Are your applications ‘legacy code’ before they even hit production?

Are your applications ‘legacy code’ before they even hit production?

When a business has custom software developed, it expects that software to last for a long time. To do this, software must A) continue operating — by adapting to changing technology needs, and B) continue to be useful — by adapting to changing business needs.

Businesses’ needs change all the time, and all but the most trivial of software will require modification beyond its original state at some point in its life, in order to to remain useful. This is sometimes known as brownfield development (as opposed to greenfield), and from my experience, accounts for a good two thirds to three quarters of all enterprise software development.

Anyway, time for a story. A company has a flash new business intelligence system developed (in .NET) to replace an aging mainframe system. Everything goes more-or-less to plan, and the system is delivered complete and on time. It’s robust and works very well — everyone is satisfied.

Now, six months have passed since the original go-live date, and the owners have a list of new new features they want added, plus a few tweaks to existing functionality.

Development begins again. After a few test releases, however, new bugs start to appear in old features that weren’t even part of phase two. Delivery dates slip, and time estimates are thrown out the window as search-and-destroy-style bug fixing takes over. Developers scramble to patch things up, guided only by the most recent round of user testing. Even the simplest of changes take weeks of development, and endless manual testing iterations to get right. The development team looks incompetent, testers are fed up, and stakeholders are left scratching their heads wondering how you could possibly break something that was working fine before. What happened?

The illusion of robustness

The robustness of many systems like this is an illusion — a temporary spike in value that’s only reflective of the fact it has been tested, and is known to work in its current state. Because everything looks fine on the surface, the quality of source code is assumed to be fine as well.

From my experience, a lot of business source code is structured like a house of cards. The software looks good from the outside and works perfectly fine, but if you want to modify or replace a central component — i.e. a card on which other cards depend — you pretty much have to re-build and re-test everything above it from scratch.

Such systems are written as legacy code from the very first day. By legacy, I mean code that is difficult to change without introducing bugs. This is most commonly caused by:

  • Low cohesion – classes and methods with several, unrelated responsibilities
  • High coupling – classes with tight dependencies on each other
  • Lack of unit tests

Low cohesion and high coupling are typical symptoms of code written under time-strapped, feature-by-feature development, where the first version that works is the final version that goes to production.

Once it’s up and running, there’s little motivation for refactoring or improving the design, no matter how ugly the code is — especially if it will require manual re-testing again. This is the value spike trap!

The tragedy of legacy code

Back to the story. If a developer on the project realises the hole they’ve fallen into, they can’t talk about it. This late in the process, such a revelation could only be taken in one of two ways — as an excuse, blaming the original designers of the software for mistakes made today — or as a very serious admission of incompetence in the development team from day one.

The other developers can’t identify any clear problems with the code, and simply assume that all brownfield development must be painful. It’s the classic story — when granted the opportunity to work on a greenfield project, you’re much happier ó unconstrained by years of cruft, you’re free to write clean, good code. Right? But if you don’t understand what was wrong with the last project you worked on, you’ll be doomed to repeat all of its mistakes. Even with the best of intentions, new legacy code is written, and without knowing it, you’ve created another maintenance nightmare just like the one before it.

Clearly, this is not a good place to be in — an endless cycle of hard-to-maintain software. So how can we break it?

Breaking the cycle

I believe that Test Driven Development (TDD) is the only way to keep your software easy-to-modify over the long-term. Why? Because:

  1. You can’t write unit tests for tightly-coupled, low cohesive code. TDD forces you to break dependencies and improve your design, simply in order to get your code into a test harness.
  2. It establishes a base line of confidence — a quantifiable percentage of code that is known to work. You can then rely on this to assert that new changes haven’t broken any existing functionality. This raises the value of code between ‘human tested’ spikes, and will catch a lot of problems before anyone outside the development team ever sees them.

TDD gives you confidence in your code by proving that it works as intended. But object-oriented languages give you a lot freedom, and if you’re not careful, your code can end up as a huge unreadable mess.

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” – Martin Fowler

So how do you write code that humans can understand? Enter TDD’s partner in crime, Domain Driven Design (DDD).

DDD establishes a very clear model of the domain (the subject area that the software was written to address), and puts it above all technological concerns that can put barriers of obfuscation between the business concepts and the code used to implement them (like the limitations of a relational database schema). The end result is very expressive entities, repositories and service classes that are based on real life concepts and can be easily explained to non-technical business stakeholders.

Plus, DDD provides a clear set rules around all the fiddly stuff like relationships and responsibilities between entities. In short, DDD is simply object-oriented code done right.

Putting it to practice

So how do you get on to a point where you can use all this? There’s a lot to learn — an entire universe of information on top of your traditional computer science/software development education. I’ve only scratched the surface here in trying to explain why I think it’s important, but if you want to learn more:

  • Read books. How many books on software development have you read this year? Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans is pretty much the DDD bible, and, just like GoF, should be mandatory reading for any serious developer. If you’re stuck in legacy code hell right now though, have a read of Michael Feathers’ Working Effectively with Legacy Code. It’s all about breaking dependencies and beating existing code into test harnesses so you can get on with improving it without fear of unintended consequences.
  • Read blogs. TDD/DDD is a hot subject right now, and blogs are great for staying on top of all the latest tools and methods. They’re also a good place for learning about TDD/DDD’s support concepts like dependency injection, inversion of control, mocking, object/relational mapping, etc.
  • Join mailing lists — DomainDrivenDesign and ALT.NET (if you’re that way inclined) are two good ones with a reasonably high volume of great intellectual discussion. You can gain a lot of insight reading other people’s real-life problems, and watching the discussion evolve as potential solutions are debated.
  • Practise. Grab an ORM, a unit test framework/runner, an IoC container and start playing around.

I believe the only way to make software truly maintainable long-term (without becoming legacy code) is to use TDD and DDD. Together, they’re all about building confidence — confidence that you can make changes to a system without introducing any bugs. Confidence that other people can understand your code and the concepts it represents. And most importantly, confidence that you’re learning, and aren’t making all the same mistakes every time you start a new project.

Getting SQL Server 2008 database projects in VS 2008 SP1

Getting SQL Server 2008 database projects in VS 2008 SP1

So, it seems Service Pack 1 for Visual Studio 2008 adds some support for SQL Server 2008, in that you can now connect and browse SQL Server 2008 servers in the Server Explorer. This’ll let you do cool stuff like generate code with LINQ to SQL, but there’s one important feature missing:

No SQL Server 2008 datbase project template in Visual Studio 2008 SP1

Where’s the SQL Server 2008 database project template?

If you and try to create a SQL 2000 or 2005 project, Visual Studio will ask for a local SQL Server 2005 instance:

There’s no way around this — “design-time validation” cannot be disabled, and SQL Server 2008 isn’t supported yet. In other words, unless you have SQL Server 2005 installed, you cannot open or create Visual Studio database projects at all. I was pretty dismayed to discover this — all I wanted was a place to chuck some .sql database migrations inside a solution!

However, you can download a temporary fix. Grab the VSTS 2008 Database Edition GDR August CTP. I have no idea what GDR stands for, but it’ll solve all your problems by adding new project types that don’t require a local SQL Server instance at all:

SQL Server 2008 database projects from VSTS Database Edition GDR CTP

The final release is due out this Spring.

Gallio, the framework-agnostic test runner for .NET

Over the past week, I’ve been converting a bunch of unit tests written with NUnit to Microsoft’s MSTest framework, the tool of choice for this particular project.

I’m not a big fan of either the MSTest framework, nor Visual Studio’s in-built test runner. It feels like you’re running unit tests in Access. Plus, the last thing my Visual Studio needs is to be cluttered up with more windows and toolbars.

Anyway, here’s where Gallio comes in. It’s a open source, framework-agnostic test runner that supports MbUnit, MSTest, NBehave, NUnit, and xUnit.Net frameworks. Running tests is quite similar to NUnit — you add test assembly paths to a .gallio project file and fire them up in Icarus, Gallio’s graphical test runner:

As well as being a nicer test runner for MSTest, Gallio is ideal for managing tests in a mixed-framework environment. It also integrates with MSBuild, NAnt, CruiseControl.NET for automated testing during your build process.

Passing a list of values into a stored procedure with XML

Passing a list of values into a stored procedure with XML

Imagine you have a list of unrelated items in your .NET application, and you need SQL Server to do something for each one. For example:

  • A customer has a shopping cart containing a list of 10 product IDs. The shopping cart is stored in ASP.NET session memory on the web server. How can you retrieve details about these ten products without knocking together some horrific WHERE clause, or executing 10 separate SELECT statements?
  • An administration section of an application allows a user to mass-edit a list of items, and save them all with one click. But the usp_updateItem stored procedure can only save one item at a time.

To minimise the number of round-trips to the database, you need to pass in multiple items at once to the same stored procedure. This is where an XML type parameter can help.

Here’s a fragment of XML containing the list of employee names and IDs I want to pass to my stored procedure:

<employees>      <employee employeeId="401312" name="John Smith" />      <employee employeeId="345334" name="John Doe" />      <employee employeeId="997889" name="Jane Doe" /></employees>

I’ll populate a table variable (so I can JOIN on it later) with an XPath query using the XML data type’s nodes() method. The technical term for this is shredding, which is pretty rad.

CREATE PROCEDURE FooBar(@employees XML)ASBEGIN      -- Create a table variable to store my items.      DECLARE @employee TABLE(EmployeeID INT, Name VARCHAR(20))      -- Shred data carried in the XML and populate the table variable with it.      INSERT INTO @employee      SELECT            e.value('@employeeId', 'INT'),            e.value('@name', 'VARCHAR(20)')      FROM            @employees.nodes('//employee') Employee(e)      -- Select from table variable as usual.      SELECT * FROM @employee eEND

Easy, huh? You can easily pass in a set of values with one XML parameter and a couple of lines of T-SQL. Note that you can of course simply shred the XML directly, as part of a bigger query – the temporary table variable is completely optional.

Passing in multiple columns isn’t a problem either. In fact, if you want to go really crazy with this stuff, you could even handle n-dimensional data structures by using nested XML elements.

This is my last article on T-SQL, by the way. I promise.

More nested XML with SQL Server: n-level tree recursion

During my foray into XML SQL queries this week, I was presented with another challenge. Instead of getting just the immediate children of a category, I now needed to recursively select all children from a tree – to an unlimited depth.

A Common Table Expression (CTE, aka WITH statement) can also be called recursively, but requires UNION ALL to join the recursive and anchor members — and XML columns can’t be unioned.

Instead, we need a user defined function that returns XML TYPE. It’ll give us a rootless collection of products within a category, and call itself again to get sub-categories. Got it? Here’s the function definition, again using the AdventureWorks database:

CREATE FUNCTION GetProductCategoryChildren(        @ParentProductCategoryID INT)RETURNS XMLASBEGIN        RETURN        (        SELECT                -- Map columns to XML attributes/elements with XPath selector                category.ProductCategoryID AS '@id',                category.Name AS '@name',                (                        -- Use a sub query for child elements.                        SELECT                                ProductID AS '@id',                                Name AS '@name',                                ListPrice AS '@price'                        FROM                                SalesLT.Product                        WHERE                                ProductCategoryID = category.ProductCategoryID                        FOR                                XML PATH('product'),  -- The element name for each row.                                TYPE -- Column is typed so it nests as XML, not text.                ) AS 'products',                dbo.GetProductCategoryChildren(category.ProductCategoryID)                        AS 'categories' -- Recursive call to get child categories.        FROM                SalesLT.ProductCategory category        WHERE                category.ParentProductCategoryID = @ParentProductCategoryID        FOR                XML PATH('category'),  -- The element name for each row.                TYPE -- The root element name for this result set.        )END

This function works great. But we still want to get details about the group itself (not just its children), and we still need a root node so we can load it into an XmlDocument. Here’s how to wrap the call to this function to get a root node and details about the parent:

-- Get the parent group's name and child products.SELECT        category.ProductCategoryID AS '@id',        category.Name AS '@name',        (                SELECT                        ProductID AS '@id',                        Name AS '@name',                        ListPrice AS '@price'                FROM                        SalesLT.Product                WHERE                        ProductCategoryID = category.ProductCategoryID                FOR                        XML PATH('product'), TYPE        ) AS 'products',        -- start recursing to get child categories.        dbo.GetProductCategoryChildren(category.ProductCategoryID) AS 'categories'FROM        SalesLT.ProductCategory categoryWHERE        category.CategoryID = 2FOR        XML PATH('category'), ROOT('categories')

This is what the output looks like. It’ll go for as many levels of depth as your tree does.

<categories>  <category id="2" name="Components">    <categories>      <category id="8" name="Handlebars">        <products>          <product id="808" name="LL Mountain Handlebars" price="44.5400" />          <product id="809" name="ML Mountain Handlebars" price="61.9200" />          <product id="810" name="HL Mountain Handlebars" price="120.2700" />          <product id="811" name="LL Road Handlebars" price="44.5400" />          <product id="812" name="ML Road Handlebars" price="61.9200" />        </products>        <categories>          <category id="9" name="Bottom Brackets">            <products>              <product id="994" name="LL Bottom Bracket" price="53.9900" />              <product id="995" name="ML Bottom Bracket" price="101.2400" />              <product id="996" name="HL Bottom Bracket" price="121.4900" />            </products>            <categories>              <category id="11" name="Chains">                <products>                  <product id="952" name="Chain" price="20.2400" />                </products>              </category>            </categories>          </category>          <category id="10" name="Brakes">            <products>              <product id="907" name="Rear Brakes" price="106.5000" />              <product id="948" name="Front Brakes" price="106.5000" />            </products>          </category>        </categories>      </category>      <category id="12" name="Cranksets">        <products>          <product id="949" name="LL Crankset" price="175.4900" />          <product id="950" name="ML Crankset" price="256.4900" />          <product id="951" name="HL Crankset" price="404.9900" />        </products>      </category>    </categories>  </category></categories>

Note I had to rearrange some of the categories in the AdventureWorks database to get deeper nesting.

Nested FOR XML results with SQL Server’s PATH mode

Today, while doing some work on a highly data (not object) driven .NET application, I needed a query output as XML from the application’s SQL Server 2005 database. I wanted:

  • Nicely formatted and properly mapped XML (e.g. no <row> elements as found in FOR XML RAW mode)
  • To be able to easily map columns to XML elements and attributes
  • A single root node, so I can load it into an XmlDocument without having to create the root node myself
  • Nested child elements
  • Not to have to turn my elegant little query into a huge mess of esoteric T-SQL (as with [Explicit!1!Mode])

I discovered that all this is surprisingly easy to achieve all of these things with SQL Server 2005′s FOR XML PATH mode. (I say surprising, because I’ve tried this sort of thing with FOR XML AUTO a few times before under SQL Server 2000, and gave up each time).

Here’s a quick example I’ve created using the venerable AdventureWorks example database, with comments against all the important bits:

    -- Map columns to XML attributes/elements with XPath selectors.
    category.ProductCategoryID AS '@id',
    category.Name AS '@name',
        -- Use a sub query for child elements.
            ProductID AS '@id',
            Name AS '@name',
            ListPrice AS '@price'
            ProductCategoryID = category.ProductCategoryID
            XML PATH('product'), -- The element name for each row.
            TYPE -- Column is typed so it nests as XML, not text.
    AS 'products' -- The root element name for this child collection.
    SalesLT.ProductCategory category
    XML PATH('category'), -- The element name for each row.
    ROOT('categories'-- The root element name for this result set.

As you can see, we’ve mapped columns to attributes/elements with XPath selectors, and set row and root element names with PATH() and ROOT() respectively.

Plus, by specifying my own names for everything, I was also able to address the difference in capitalization, prefixing and pluralization style between the AdventureWorks’ database table names and common XML.

Running this query produces output in the following format. Note the root nodes for both outer and child collections:

  <category id="4" name="Accessories" />
  <category id="24" name="Gloves">
      <product id="858" name="Half-Finger Gloves, S" price="24.4900" />
      <product id="859" name="Half-Finger Gloves, M" price="24.4900" />
      <product id="860" name="Half-Finger Gloves, L" price="24.4900" />
      <product id="861" name="Full-Finger Gloves, S" price="37.9900" />
      <product id="862" name="Full-Finger Gloves, M" price="37.9900" />
      <product id="863" name="Full-Finger Gloves, L" price="37.9900" />
  <category id="35" name="Helmets">
      <product id="707" name="Sport-100 Helmet, Red" price="34.9900" />
      <product id="708" name="Sport-100 Helmet, Black" price="34.9900" />
      <product id="711" name="Sport-100 Helmet, Blue" price="34.9900" />