IRepository: one size does not fit all

I’ve been spending a lot of time putting TDD/DDD into practice lately, and obsessing over all sorts of small-yet-important details as I try concepts and patterns for myself.

One pattern that has recently become popular in mainstream .NET is IRepository<T>. Originally documented in PoEAA, a repository is an adapter that can read and write objects from the database as if it were an in-memory collection like an array. This is called persistence ignorance (PI), and lets you forget about underlying database/ORM semantics.

Somewhere along the line however, someone realized that with generics you can create a single, shared interface that will support pretty much any operation you could ever want to do with any sort of object. They usually look something like this:

public interface IRepository<T>
{
    T GetById(int id);
    IEnumerable<T> GetAll();
    IEnumerable<T> FindAll(IDictionary<string, object> propertyValuePairs);
    T FindOne(IDictionary<string, object> propertyValuePairs);
    T SaveOrUpdate(T entity);
    void Delete(T entity);
}

If you need to add any custom behaviour, you can simply subclass it — e.g. IProjectRepository : IRepository — and add new methods there. That’s pretty handy for a quick forms-over-LINQ-to-SQL application.

However, I don’t believe this is satisfactory for applications using domain-driven design (DDD), as repositories can vary greatly in their capabilities. For example, some repositories will allow aggregates to be added, but not deleted (like legal records). Others might be completely read-only. Trying to shoehorn them all into a one-size-fits-all IRepository<T> interface will simple result in a lot of leaky abstractions: you could end up with a Remove() method that is available but always throws InvalidOperationException, or developer team rules like “do not never call Save() on RepositoryX”. That would be pretty bad, so what else can we do instead?

Possibility #2: no common repository interface at all

The first alternative is simply dropping the IRepository<T> base, and make a totally custom repository for each aggregate.

public interface IProjectRepository
{
    Project GetById(int id);
    void Delete(Project entity);
    // ... etc
}

This is good, because we won’t inherit a bunch of crap we don’t want, but similar repositories will have a lot of duplicate code. Making method names consistent is now left to the developer — we could end up with one repository having Load(), another Get(), another Retrieve() etc. This isn’t very good.

Thinking about it, a lot of repositories are going to fit into one or two broad categories that share a few common methods — those that support reading and writing, and those that only support reading. What if we extracted these categories out into semi-specialized base interfaces?

Possibility #3: IReadOnlyRepository and IMutableRepository

What if we provided base interfaces for common repository types like this:

This is better, but still doesn’t satisfy all needs. Providing a GetAll() method might be helpful for a small collection of aggregates that we enumerate over often, but it wouldn’t make so much sense for a database of a million customers. We still need to be able to include/exclude standard capabilities at a more granular level.

Possibility #4: Repository Traits

Let’s create a whole load of little fine-grained interfaces — one for each standard method a repository might have.

public interface ICanGetAll<T>
{
    IEnumerable<T> GetAll();
}

public interface ICanGetById<TEntity, TKey>
{
    TEntity GetById(TKey id);
}

public interface ICanRemove<T>
{
    void Remove(T entity);
}

public interface ICanSave<T>
{
    void Save(T entity);
}

public interface ICanGetCount
{
    int GetCount();
}

// ...etc

I am calling these Repository Traits. When defining a repository interface for a particular aggregate, you can explicitly pick and choose which methods it should have, as well as adding your own custom ones:

public interface IProjectRepository :
    ICanSave<Project>,
    ICanRemove<Project>,
    ICanGetById<Project, int>,
    ICanGetByName<Project>
{
    IEnumerable<Project> GetProjectsForUser(User user);
}

This lets you define repositories that can do as little or as much as they need to, and no more. If you recognize a new trait that may be shared by several repositories — e.g., ICanDeleteAll — all you need to do is define and implement a new interface.

Side note: concrete repositories can still have generic bases

Out of interest, here’s what my concrete PersonRepository looks like:

There’s very little new code in it, because all of the common repository traits are already satisfied by a generic, one-size-fits-all NHibernateRepository base class (which must be completely hidden from external callers!). The IPersonRepository just defines which subset of its methods are available to the domain layer.

Cargo-cult commenting

Currently I am responsible for maintaining a big legacy .NET ERP system. It’s plagued by something I like to call cargo-cult commenting — source code comments that are ritually included, for some reason, but serve no real purpose at all. These are the sort of comments that:

  • Are written for aesthetic reasons more than anything else
  • Usually just repeat the next line of C# in sentence form
  • Tell you what the code does, not the reasons why
  • Don’t add any value whatsoever
  • Just make the code base even more bloated and crappy than it was before

Here are a few examples to demonstrate exactly what I mean.

/// <summary>/// POST/// </summary>/// <param name="oSpace"></param>public override void POST(ref Space oSpace)
//Create a new attribute.XmlAttribute newAttr = xmlDocuments.CreateAttribute("Directory");newAttr.Value = "CompanyCorrespondenceDocs";
// read-only?if (bVehicleEdit == false) oField.SetReadWrite(ref oSpace, true);
// return the vehicle riskreturn DataAccess.ExecuteXml("Result", "Row", "Vehicle_VehicleRisk_Get",     iVehicleID).SelectSingleNode("Result/Row");

This sort of rubbish gets in the system because people are taught the stupid rule that comments are always good, no matter what their purpose is. One developer sees another developer writing a line of comments for every 2-3 lines of code, copies it, and pretty soon sections without any green look naked and unfinished. If only unit tests were written like this!

Anyway, my golden rules for code commenting are:

  1. There are only two valid reasons for writing comments — to explain why code was written (e.g. original business motivation), or as a warning when something unexpected happens (here be dragons).
  2. Write the reasons why something is done, not how.
  3. If you find yourself putting comments around little blocks of 3-6 lines of related code, each of these blocks should probably be a separate method with a descriptive name instead.
  4. If you can’t write anything useful (i.e. that couldn’t be determined by a quick glance at the code), don’t write anything at all.
  5. BUT if you find yourself regularly needing to write comments because it is not clear what’s going on from the code, then your code probably isn’t expressive enough.

Writing good comments is hard. One habit I’ve recently adopted is to paste entire paragraphs of relevant user requirement into my code to explain why it was written. It’s working really well — it saves me time, and produces comments of very high quality, straight from the horse’s mouth. I encourage you to try it too.

Find missing foreign/primary keys in SQL Server

Find missing foreign/primary keys in SQL Server

Last week I wrote a SQL query to estimate how many columns are missing from foreign or primary keys. This works because of our naming convention for database keys:

  • We use a Code suffix for natural keys e.g. CountryCode = NZ
  • We use an ID suffix for surrogate keys e.g. EmployeeID = 32491

This script looks for any columns that match this naming pattern, but aren’t part of a primary or foreign key relationship.

-- Find columns on tables with names like FooID or FooCode which should-- be part of primary or foreign keys, but aren't.SELECT        t.name AS [Table],        c.name AS [Column]FROM        sys.tables t        INNER JOIN sys.syscolumns c ON                c.id = t.object_id                -- Join on foreign key columns        LEFT JOIN sys.foreign_key_columns fkc ON                (fkc.parent_object_id = t.object_id                AND c.colid = fkc.parent_column_id)                OR (fkc.referenced_object_id = t.object_id                AND c.colid = fkc.referenced_column_id)                -- Join on primary key columns        LEFT JOIN sys.indexes i ON                i.object_id = t.object_id                and i.is_primary_key = 1        LEFT JOIN sys.index_columns ic ON                ic.object_id = t.object_id                AND ic.index_id = i.index_id                AND ic.column_id = c.colidWHERE        t.is_ms_shipped = 0        AND (c.name LIKE '%ID' OR c.name LIKE '%Code')        AND        (                fkc.constraint_object_id IS NULL -- Not part of a foreign key                 AND ic.object_id IS NULL -- Not part of a primary key        )        AND        (                -- Ignore some tables                t.name != 'sysdiagrams'                AND t.name NOT LIKE '[_]%' -- temp tables                AND t.name NOT LIKE '%temp%'                AND t.name NOT LIKE '%Log%' -- log tables                                -- Ignore some columns                AND c.name NOT IN ('GLCode', 'EID', 'AID') -- external keys        )ORDER BY        t.name,        c.name

Using this script, I found over 200 missing foreign keys in one production database!

Are your applications ‘legacy code’ before they even hit production?

Are your applications ‘legacy code’ before they even hit production?

When a business has custom software developed, it expects that software to last for a long time. To do this, software must A) continue operating — by adapting to changing technology needs, and B) continue to be useful — by adapting to changing business needs.

Businesses’ needs change all the time, and all but the most trivial of software will require modification beyond its original state at some point in its life, in order to to remain useful. This is sometimes known as brownfield development (as opposed to greenfield), and from my experience, accounts for a good two thirds to three quarters of all enterprise software development.

Anyway, time for a story. A company has a flash new business intelligence system developed (in .NET) to replace an aging mainframe system. Everything goes more-or-less to plan, and the system is delivered complete and on time. It’s robust and works very well — everyone is satisfied.

Now, six months have passed since the original go-live date, and the owners have a list of new new features they want added, plus a few tweaks to existing functionality.

Development begins again. After a few test releases, however, new bugs start to appear in old features that weren’t even part of phase two. Delivery dates slip, and time estimates are thrown out the window as search-and-destroy-style bug fixing takes over. Developers scramble to patch things up, guided only by the most recent round of user testing. Even the simplest of changes take weeks of development, and endless manual testing iterations to get right. The development team looks incompetent, testers are fed up, and stakeholders are left scratching their heads wondering how you could possibly break something that was working fine before. What happened?

The illusion of robustness

The robustness of many systems like this is an illusion — a temporary spike in value that’s only reflective of the fact it has been tested, and is known to work in its current state. Because everything looks fine on the surface, the quality of source code is assumed to be fine as well.

From my experience, a lot of business source code is structured like a house of cards. The software looks good from the outside and works perfectly fine, but if you want to modify or replace a central component — i.e. a card on which other cards depend — you pretty much have to re-build and re-test everything above it from scratch.

Such systems are written as legacy code from the very first day. By legacy, I mean code that is difficult to change without introducing bugs. This is most commonly caused by:

  • Low cohesion – classes and methods with several, unrelated responsibilities
  • High coupling – classes with tight dependencies on each other
  • Lack of unit tests

Low cohesion and high coupling are typical symptoms of code written under time-strapped, feature-by-feature development, where the first version that works is the final version that goes to production.

Once it’s up and running, there’s little motivation for refactoring or improving the design, no matter how ugly the code is — especially if it will require manual re-testing again. This is the value spike trap!

The tragedy of legacy code

Back to the story. If a developer on the project realises the hole they’ve fallen into, they can’t talk about it. This late in the process, such a revelation could only be taken in one of two ways — as an excuse, blaming the original designers of the software for mistakes made today — or as a very serious admission of incompetence in the development team from day one.

The other developers can’t identify any clear problems with the code, and simply assume that all brownfield development must be painful. It’s the classic story — when granted the opportunity to work on a greenfield project, you’re much happier ó unconstrained by years of cruft, you’re free to write clean, good code. Right? But if you don’t understand what was wrong with the last project you worked on, you’ll be doomed to repeat all of its mistakes. Even with the best of intentions, new legacy code is written, and without knowing it, you’ve created another maintenance nightmare just like the one before it.

Clearly, this is not a good place to be in — an endless cycle of hard-to-maintain software. So how can we break it?

Breaking the cycle

I believe that Test Driven Development (TDD) is the only way to keep your software easy-to-modify over the long-term. Why? Because:

  1. You can’t write unit tests for tightly-coupled, low cohesive code. TDD forces you to break dependencies and improve your design, simply in order to get your code into a test harness.
  2. It establishes a base line of confidence — a quantifiable percentage of code that is known to work. You can then rely on this to assert that new changes haven’t broken any existing functionality. This raises the value of code between ‘human tested’ spikes, and will catch a lot of problems before anyone outside the development team ever sees them.

TDD gives you confidence in your code by proving that it works as intended. But object-oriented languages give you a lot freedom, and if you’re not careful, your code can end up as a huge unreadable mess.

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” – Martin Fowler

So how do you write code that humans can understand? Enter TDD’s partner in crime, Domain Driven Design (DDD).

DDD establishes a very clear model of the domain (the subject area that the software was written to address), and puts it above all technological concerns that can put barriers of obfuscation between the business concepts and the code used to implement them (like the limitations of a relational database schema). The end result is very expressive entities, repositories and service classes that are based on real life concepts and can be easily explained to non-technical business stakeholders.

Plus, DDD provides a clear set rules around all the fiddly stuff like relationships and responsibilities between entities. In short, DDD is simply object-oriented code done right.

Putting it to practice

So how do you get on to a point where you can use all this? There’s a lot to learn — an entire universe of information on top of your traditional computer science/software development education. I’ve only scratched the surface here in trying to explain why I think it’s important, but if you want to learn more:

  • Read books. How many books on software development have you read this year? Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans is pretty much the DDD bible, and, just like GoF, should be mandatory reading for any serious developer. If you’re stuck in legacy code hell right now though, have a read of Michael Feathers’ Working Effectively with Legacy Code. It’s all about breaking dependencies and beating existing code into test harnesses so you can get on with improving it without fear of unintended consequences.
  • Read blogs. TDD/DDD is a hot subject right now, and blogs are great for staying on top of all the latest tools and methods. They’re also a good place for learning about TDD/DDD’s support concepts like dependency injection, inversion of control, mocking, object/relational mapping, etc.
  • Join mailing lists — DomainDrivenDesign and ALT.NET (if you’re that way inclined) are two good ones with a reasonably high volume of great intellectual discussion. You can gain a lot of insight reading other people’s real-life problems, and watching the discussion evolve as potential solutions are debated.
  • Practise. Grab an ORM, a unit test framework/runner, an IoC container and start playing around.

I believe the only way to make software truly maintainable long-term (without becoming legacy code) is to use TDD and DDD. Together, they’re all about building confidence — confidence that you can make changes to a system without introducing any bugs. Confidence that other people can understand your code and the concepts it represents. And most importantly, confidence that you’re learning, and aren’t making all the same mistakes every time you start a new project.

Getting SQL Server 2008 database projects in VS 2008 SP1

Getting SQL Server 2008 database projects in VS 2008 SP1

So, it seems Service Pack 1 for Visual Studio 2008 adds some support for SQL Server 2008, in that you can now connect and browse SQL Server 2008 servers in the Server Explorer. This’ll let you do cool stuff like generate code with LINQ to SQL, but there’s one important feature missing:

No SQL Server 2008 datbase project template in Visual Studio 2008 SP1

Where’s the SQL Server 2008 database project template?

If you and try to create a SQL 2000 or 2005 project, Visual Studio will ask for a local SQL Server 2005 instance:

There’s no way around this — “design-time validation” cannot be disabled, and SQL Server 2008 isn’t supported yet. In other words, unless you have SQL Server 2005 installed, you cannot open or create Visual Studio database projects at all. I was pretty dismayed to discover this — all I wanted was a place to chuck some .sql database migrations inside a solution!

However, you can download a temporary fix. Grab the VSTS 2008 Database Edition GDR August CTP. I have no idea what GDR stands for, but it’ll solve all your problems by adding new project types that don’t require a local SQL Server instance at all:

SQL Server 2008 database projects from VSTS Database Edition GDR CTP

The final release is due out this Spring.

Gallio, the framework-agnostic test runner for .NET

Over the past week, I’ve been converting a bunch of unit tests written with NUnit to Microsoft’s MSTest framework, the tool of choice for this particular project.

I’m not a big fan of either the MSTest framework, nor Visual Studio’s in-built test runner. It feels like you’re running unit tests in Access. Plus, the last thing my Visual Studio needs is to be cluttered up with more windows and toolbars.

Anyway, here’s where Gallio comes in. It’s a open source, framework-agnostic test runner that supports MbUnit, MSTest, NBehave, NUnit, and xUnit.Net frameworks. Running tests is quite similar to NUnit — you add test assembly paths to a .gallio project file and fire them up in Icarus, Gallio’s graphical test runner:

As well as being a nicer test runner for MSTest, Gallio is ideal for managing tests in a mixed-framework environment. It also integrates with MSBuild, NAnt, CruiseControl.NET for automated testing during your build process.

Passing a list of values into a stored procedure with XML

Passing a list of values into a stored procedure with XML

Imagine you have a list of unrelated items in your .NET application, and you need SQL Server to do something for each one. For example:

  • A customer has a shopping cart containing a list of 10 product IDs. The shopping cart is stored in ASP.NET session memory on the web server. How can you retrieve details about these ten products without knocking together some horrific WHERE clause, or executing 10 separate SELECT statements?
  • An administration section of an application allows a user to mass-edit a list of items, and save them all with one click. But the usp_updateItem stored procedure can only save one item at a time.

To minimise the number of round-trips to the database, you need to pass in multiple items at once to the same stored procedure. This is where an XML type parameter can help.

Here’s a fragment of XML containing the list of employee names and IDs I want to pass to my stored procedure:

<employees>      <employee employeeId="401312" name="John Smith" />      <employee employeeId="345334" name="John Doe" />      <employee employeeId="997889" name="Jane Doe" /></employees>

I’ll populate a table variable (so I can JOIN on it later) with an XPath query using the XML data type’s nodes() method. The technical term for this is shredding, which is pretty rad.

CREATE PROCEDURE FooBar(@employees XML)ASBEGIN      -- Create a table variable to store my items.      DECLARE @employee TABLE(EmployeeID INT, Name VARCHAR(20))      -- Shred data carried in the XML and populate the table variable with it.      INSERT INTO @employee      SELECT            e.value('@employeeId', 'INT'),            e.value('@name', 'VARCHAR(20)')      FROM            @employees.nodes('//employee') Employee(e)      -- Select from table variable as usual.      SELECT * FROM @employee eEND

Easy, huh? You can easily pass in a set of values with one XML parameter and a couple of lines of T-SQL. Note that you can of course simply shred the XML directly, as part of a bigger query – the temporary table variable is completely optional.

Passing in multiple columns isn’t a problem either. In fact, if you want to go really crazy with this stuff, you could even handle n-dimensional data structures by using nested XML elements.

This is my last article on T-SQL, by the way. I promise.

More nested XML with SQL Server: n-level tree recursion

During my foray into XML SQL queries this week, I was presented with another challenge. Instead of getting just the immediate children of a category, I now needed to recursively select all children from a tree – to an unlimited depth.

A Common Table Expression (CTE, aka WITH statement) can also be called recursively, but requires UNION ALL to join the recursive and anchor members — and XML columns can’t be unioned.

Instead, we need a user defined function that returns XML TYPE. It’ll give us a rootless collection of products within a category, and call itself again to get sub-categories. Got it? Here’s the function definition, again using the AdventureWorks database:

CREATE FUNCTION GetProductCategoryChildren(        @ParentProductCategoryID INT)RETURNS XMLASBEGIN        RETURN        (        SELECT                -- Map columns to XML attributes/elements with XPath selector                category.ProductCategoryID AS '@id',                category.Name AS '@name',                (                        -- Use a sub query for child elements.                        SELECT                                ProductID AS '@id',                                Name AS '@name',                                ListPrice AS '@price'                        FROM                                SalesLT.Product                        WHERE                                ProductCategoryID = category.ProductCategoryID                        FOR                                XML PATH('product'),  -- The element name for each row.                                TYPE -- Column is typed so it nests as XML, not text.                ) AS 'products',                dbo.GetProductCategoryChildren(category.ProductCategoryID)                        AS 'categories' -- Recursive call to get child categories.        FROM                SalesLT.ProductCategory category        WHERE                category.ParentProductCategoryID = @ParentProductCategoryID        FOR                XML PATH('category'),  -- The element name for each row.                TYPE -- The root element name for this result set.        )END

This function works great. But we still want to get details about the group itself (not just its children), and we still need a root node so we can load it into an XmlDocument. Here’s how to wrap the call to this function to get a root node and details about the parent:

-- Get the parent group's name and child products.SELECT        category.ProductCategoryID AS '@id',        category.Name AS '@name',        (                SELECT                        ProductID AS '@id',                        Name AS '@name',                        ListPrice AS '@price'                FROM                        SalesLT.Product                WHERE                        ProductCategoryID = category.ProductCategoryID                FOR                        XML PATH('product'), TYPE        ) AS 'products',        -- start recursing to get child categories.        dbo.GetProductCategoryChildren(category.ProductCategoryID) AS 'categories'FROM        SalesLT.ProductCategory categoryWHERE        category.CategoryID = 2FOR        XML PATH('category'), ROOT('categories')

This is what the output looks like. It’ll go for as many levels of depth as your tree does.

<categories>  <category id="2" name="Components">    <categories>      <category id="8" name="Handlebars">        <products>          <product id="808" name="LL Mountain Handlebars" price="44.5400" />          <product id="809" name="ML Mountain Handlebars" price="61.9200" />          <product id="810" name="HL Mountain Handlebars" price="120.2700" />          <product id="811" name="LL Road Handlebars" price="44.5400" />          <product id="812" name="ML Road Handlebars" price="61.9200" />        </products>        <categories>          <category id="9" name="Bottom Brackets">            <products>              <product id="994" name="LL Bottom Bracket" price="53.9900" />              <product id="995" name="ML Bottom Bracket" price="101.2400" />              <product id="996" name="HL Bottom Bracket" price="121.4900" />            </products>            <categories>              <category id="11" name="Chains">                <products>                  <product id="952" name="Chain" price="20.2400" />                </products>              </category>            </categories>          </category>          <category id="10" name="Brakes">            <products>              <product id="907" name="Rear Brakes" price="106.5000" />              <product id="948" name="Front Brakes" price="106.5000" />            </products>          </category>        </categories>      </category>      <category id="12" name="Cranksets">        <products>          <product id="949" name="LL Crankset" price="175.4900" />          <product id="950" name="ML Crankset" price="256.4900" />          <product id="951" name="HL Crankset" price="404.9900" />        </products>      </category>    </categories>  </category></categories>

Note I had to rearrange some of the categories in the AdventureWorks database to get deeper nesting.

Nested FOR XML results with SQL Server’s PATH mode

Today, while doing some work on a highly data (not object) driven .NET application, I needed a query output as XML from the application’s SQL Server 2005 database. I wanted:

  • Nicely formatted and properly mapped XML (e.g. no <row> elements as found in FOR XML RAW mode)
  • To be able to easily map columns to XML elements and attributes
  • A single root node, so I can load it into an XmlDocument without having to create the root node myself
  • Nested child elements
  • Not to have to turn my elegant little query into a huge mess of esoteric T-SQL (as with [Explicit!1!Mode])

I discovered that all this is surprisingly easy to achieve all of these things with SQL Server 2005′s FOR XML PATH mode. (I say surprising, because I’ve tried this sort of thing with FOR XML AUTO a few times before under SQL Server 2000, and gave up each time).

Here’s a quick example I’ve created using the venerable AdventureWorks example database, with comments against all the important bits:

SELECT
    -- Map columns to XML attributes/elements with XPath selectors.
    category.ProductCategoryID AS '@id',
    category.Name AS '@name',
    (
        -- Use a sub query for child elements.
        SELECT
            ProductID AS '@id',
            Name AS '@name',
            ListPrice AS '@price'
        FROM
            SalesLT.Product
        WHERE
            ProductCategoryID = category.ProductCategoryID
        FOR
            XML PATH('product'), -- The element name for each row.
            TYPE -- Column is typed so it nests as XML, not text.
    AS 'products' -- The root element name for this child collection.
FROM
    SalesLT.ProductCategory category
FOR
    XML PATH('category'), -- The element name for each row.
    ROOT('categories'-- The root element name for this result set.

As you can see, we’ve mapped columns to attributes/elements with XPath selectors, and set row and root element names with PATH() and ROOT() respectively.

Plus, by specifying my own names for everything, I was also able to address the difference in capitalization, prefixing and pluralization style between the AdventureWorks’ database table names and common XML.

Running this query produces output in the following format. Note the root nodes for both outer and child collections:

<categories>
  <category id="4" name="Accessories" />
  <category id="24" name="Gloves">
    <products>
      <product id="858" name="Half-Finger Gloves, S" price="24.4900" />
      <product id="859" name="Half-Finger Gloves, M" price="24.4900" />
      <product id="860" name="Half-Finger Gloves, L" price="24.4900" />
      <product id="861" name="Full-Finger Gloves, S" price="37.9900" />
      <product id="862" name="Full-Finger Gloves, M" price="37.9900" />
      <product id="863" name="Full-Finger Gloves, L" price="37.9900" />
    </products>
  </category>
  <category id="35" name="Helmets">
    <products>
      <product id="707" name="Sport-100 Helmet, Red" price="34.9900" />
      <product id="708" name="Sport-100 Helmet, Black" price="34.9900" />
      <product id="711" name="Sport-100 Helmet, Blue" price="34.9900" />
    </products>
  </category>
</categories>

Strategies for resource-based 404 errors in ASP.NET MVC

There are already a couple of patterns for handling 404 errors in ASP.NET MVC.

  • For invalid routes, you can add a catch-all {*url} route to match “anything else” that couldn’t be handled by any other route.
  • For invalid controllers and controller actions, you can implement your own IControllerFactory with inbuilt error handling.

But what happens when a user invokes a valid action on a valid controller, and the requested resource — product, article, sub-category etc — doesn’t exist? Let’s investigate some of the options available.

Classic ASP.NET custom errors: easy, but not that great

The easiest option is to use ASP.NET’s custom errors feature, redirecting users to special pages according to the status code of the exception that was caught.

All you need to do is to uncomment the following section from your web.config, and create a new file called FileNotFound.htm:

<customErrors mode="On">
  <error statusCode="404" redirect="FileNotFound.htm" />
</customErrors>

Then you can start throwing 404 HttpExceptions from your action method.

// Get information about a product.
public ActionResult Detail(int? id)
{
    // Look up the product by ID.
    Product product = products.FirstOrDefault(p => p.Id == id);
    // If it wasn't found, throw a 404 HttpException.
    if (product == null)
        throw new HttpException(404, "Product not found.");
    return View(product);
}

When a user requests the detail for a non-existent product, here’s what they’ll see with a static FileNotFound.htm 404 page (style copied from the application’s MasterPage):

This method is very easy to implement but has a few disadvantages — particularly from the user’s point of view.

  • The redirection and error page return HTTP status codes 302 Found and 200 OK respectively. While most humans might not notice the difference between 302 Found and 404 Not Found, search engines certainly appreciate the distinction when indexing content. Additionally, under REST principles, redirecting to a different path is not an appropriate response when an particular resource cannot be found. The server should simply return a 404 not found status code.
  • The path to the requested page is passed externally, and the user is redirected to a traditional .html or .aspx page. This looks noisy and unprofessional when used alongside ASP.NET MVC’s elegant paths. The path redirection also makes it difficult for users to retype and correct badly-entered URLs.
  • These error pages are handled outside the ASP.NET MVC framework. ViewData, Models and Filters are not available — data access, logging and master pages will be implemented differently to the rest of your web site.

While they do arguably work, ASP.NET custom errors are clearly not the best tool for the job.

ASP.NET MVC HandleErrorAttribute: getting warmer

The next option is the ASP.NET MVC framework’s built in HandleErrorAttribute, which renders a special error view when uncaught exceptions are thrown from a controller action.

// My custom exception type for 404 errors.
public class ResourceNotFoundException : Exception {}

 

// Render the ResourceNotFound view when a ResourceNotFoundException is thrown.
[HandleError(ExceptionType = typeof(ResourceNotFoundException),
    View = "ResourceNotFound")]
public class ProductsController : Controller
{
    // Get information about a product.
    public ActionResult Detail(int? id)
    {
        // Look up the product by ID.
        Product product = products.FirstOrDefault(p => p.Id == id);
            
        // If it wasn't found, throw a ResourceNotFoundException.
        if (product == null)
            throw new ResourceNotFoundException();
        return View(product);
    }
}

To achieve this, I defined my own ResourceNotFoundException class, and decorated my controller class with a HandleErrorAttribute. Actually you would probably decorate it twice — one for all exceptions, and one specifically for the ResourceNotFoundException type.

Anyway, in this example, when an action throws a ResourceNotFoundException, the HandleErrorAttribute filter will catch it and render a view called ResourceNotFound. If you don’t specify a view, HandleErrorAttribute will look for a default view called Error. I have chosen to use a different one, because I don’t want 404 errors reported in the same way as general application errors.

This renders better results — no URL redirection, and the code is contained within the ASP.NET MVC framework. However, there’s still a problem.

As you can see, the invalid resource request is coming back as a 500 Server Error. This isn’t by any flaw of the HandleErrorAttribute class; it’s simply designed to handle application errors, not 404s. In fact, it will explicitly ignore any exceptions that have an HTTP status code other than 500.

While it works pretty well, ASP.NET MVC’s HandleErrorAttribute isn’t really suited as a 404 handler.

Custom IExceptionAttribute

HandleErrorAttribute is a filter — a feature of the ASP.NET MVC framework that allows you to execute before or after a controller action is called, or when an exception is thrown. They’re great for checking authentication, logging page requests, exception handling and other cross-cutting concerns.

Here’s a filter I’ve created — it’s effectively a clone of the built in HandleErrorAttribute, except that it ignores everything except ResourceNotFoundExceptions, and it sets the HTTP status code to 404 instead of 500.

public class HandleResourceNotFoundAttribute : FilterAttribute, IExceptionFilter
{
    public void OnException(ExceptionContext filterContext)
    {
        Controller controller = filterContext.Controller as Controller;
        if (controller == null || filterContext.ExceptionHandled)
            return;
        Exception exception = filterContext.Exception;
        if (exception == null)
            return;
        // Action method exceptions will be wrapped in a
        // TargetInvocationException since they're invoked using
        // reflection, so we have to unwrap it.
        if (exception is TargetInvocationException)
            exception = exception.InnerException;
        // If this is not a ResourceNotFoundException error, ignore it.
        if (!(exception is ResourceNotFoundException))
            return;
        filterContext.Result = new ViewResult()
        {
            TempData = controller.TempData,
            ViewName = View
        };
        filterContext.ExceptionHandled = true;
        filterContext.HttpContext.Response.Clear();
        filterContext.HttpContext.Response.StatusCode = 404;
    }
}

As with HandleErrorAttribute, you simply decorate the controller class with a HandleResourceNotFound attribute.

// Render the ResourceNotFound view when a ResourceNotFoundException is thrown.
[HandleResourceNotFound]
public class ProductsController : Controller
{
    // Get information about a product.
    public ActionResult Detail(int? id)
    {
        // Look up the product by ID.
        Product product = products.FirstOrDefault(p => p.Id == id);
            
        // If it wasn't found, throw a ResourceNotFoundException.
        if (product == null)
            throw new ResourceNotFoundException();
        return View(product);
    }
}

This filter produces very similar output to that of the HandleErrorAttribute, but with the correct 404 HTTP status code. This is exactly what we want — an easy-to-generate 404 status code, raised and presented within the ASP.NET MVC framework.

In part 2 of this article, I’ll discuss methods for improving the quality of these error pages, and why a single generic 404 error page isn’t sufficient for a modern, rich web application.