Duct-tape programmers ship… once

Duct-tape programmers ship… once

I read an interesting article today full of praise for duct-tape programmers, a breed of developers who are smart enough to get away without unit tests, and ship code early by using duct-tape style programming techniques.

Duct-tape programmers are considered great because they help companies take first-mover advantage, while others waste time developing abstract frameworks and trying to over-engineer everything. Customers and project managers love duct-tape programmers because they get the job done fast.

I see duct-tape programmers in a different way:

Of course you need to ship products fast to beat your competition. But duct-tape isn’t a sustainable way to build software, unless you have the luxury of never having to touch the code again (one-off console games come to mind here). Otherwise, maintenance costs from brittle code will soon come back to bite and affect your ability to ship version 2.0 in time.

Our challenge is to find a balance in the middle where we aim high, but have an architecture flexible enough to allow duct-tape-style compromises in a clean and isolated fashion when required.

Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules

Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules

A common pattern for checking an entity is valid involves testing a number of rules on it. After all tests have been performed, a list of broken rules is returned.

Consider this example for validating instances of a simple Customer class:

class CustomerValidator{    public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c)    {        IList<RuleViolation> ruleViolations = new List<RuleViolation>();        if (String.IsNullOrEmpty(c.FirstName))        {            ruleViolations.Add(new RuleViolation("FirstName",                 "First name cannot be empty."));        }        if (String.IsNullOrEmpty(c.LastName))        {            ruleViolations.Add(new RuleViolation("LastName",                 "Last name cannot be empty."));        }        if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+"))        {            ruleViolations.Add(new RuleViolation("PhoneNumber",                 "Invalid phone number."));        }        return ruleViolations;    }}

Quite often though, we don’t care about the full list of broken rules — we only care if the object is valid or not. Instead of…

IEnumerable<RuleViolation> brokenRules =     customerValidator.GetAllRuleViolations(customer);if (brokenRules.Count() > 0)    // do stuff</pre>...we would rather have:</p><pre lang="csharp">if (!customerValidator.IsValid(customer))    // do stuff

So what’s the difference between checking if an entity is valid and getting a detailed list of validation errors?

For starters, the only way of finding out if an entity is valid is by testing all the rules against it. Let’s assume this is a reasonably intensive operation — if you have a lot of rules, or need to check things with external systems (checking a username doesn’t already exist in the database, for example).

If all we’re doing is checking if the entity is valid, we want to do as little work as possible. This means stopping as soon as we hit a broken rule.

The easiest way to do this is with the yield return keyword. Yield return is kind of strange — it lets you iterate over objects as they are returned from a method. This is used for evaluating LINQ expression trees. Instead of filtering and reducing a collection one criteria at a time — e.g. testing 1000 objects, then re-testing the 500 objects that passed, etc — it tests each object against all the criteria at once.

In this case, we simply want to return as soon as a broken rule is encountered.

class CustomerValidator{    public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c)    {        if (String.IsNullOrEmpty(c.FirstName))        {            yield return new RuleViolation("FirstName",                 "First name cannot be empty.");        }                if (String.IsNullOrEmpty(c.LastName))        {            yield return new RuleViolation("LastName",                 "Last name cannot be empty.");        }        if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+"))        {            yield return new RuleViolation("PhoneNumber",                 "Invalid phone number.");        }    }}

See that? The method is still defined as returning a collection, but it has three return statements with a single object each. With this, we can use a little bit of LINQ to break out of the method early:

    public bool IsValid(Customer c)    {        // are there any rule violations?        return !GetAllRuleViolations(c).Any();    }

I whipped up a little test app to prove this — IsValid vs GetAllRuleViolations.

CustomerValidator validator = new CustomerValidator();// An invalid customerCustomer customer = new Customer(){    FirstName = "",    LastName = "",    PhoneNumber = "asdsd"};// do as little work as possible to work out if the entity is valid or notConsole.WriteLine("IsValid = ");Console.WriteLine(validator.IsValid(customer));// check everything for a full reportConsole.WriteLine("GetAllRuleViolations().Count =");Console.WriteLine(validator.GetAllRuleViolations(customer).Count());

Here’s what it produces. Note that IsValid returns immediately after the first rule fails.

IsValid =        Testing first nameTrueGetAllRuleViolations().Count =        Testing first name        Testing last name        Testing phone number3

Find missing foreign/primary keys in SQL Server

Find missing foreign/primary keys in SQL Server

Last week I wrote a SQL query to estimate how many columns are missing from foreign or primary keys. This works because of our naming convention for database keys:

  • We use a Code suffix for natural keys e.g. CountryCode = NZ
  • We use an ID suffix for surrogate keys e.g. EmployeeID = 32491

This script looks for any columns that match this naming pattern, but aren’t part of a primary or foreign key relationship.

-- Find columns on tables with names like FooID or FooCode which should-- be part of primary or foreign keys, but aren't.SELECT        t.name AS [Table],        c.name AS [Column]FROM        sys.tables t        INNER JOIN sys.syscolumns c ON                c.id = t.object_id                -- Join on foreign key columns        LEFT JOIN sys.foreign_key_columns fkc ON                (fkc.parent_object_id = t.object_id                AND c.colid = fkc.parent_column_id)                OR (fkc.referenced_object_id = t.object_id                AND c.colid = fkc.referenced_column_id)                -- Join on primary key columns        LEFT JOIN sys.indexes i ON                i.object_id = t.object_id                and i.is_primary_key = 1        LEFT JOIN sys.index_columns ic ON                ic.object_id = t.object_id                AND ic.index_id = i.index_id                AND ic.column_id = c.colidWHERE        t.is_ms_shipped = 0        AND (c.name LIKE '%ID' OR c.name LIKE '%Code')        AND        (                fkc.constraint_object_id IS NULL -- Not part of a foreign key                 AND ic.object_id IS NULL -- Not part of a primary key        )        AND        (                -- Ignore some tables                t.name != 'sysdiagrams'                AND t.name NOT LIKE '[_]%' -- temp tables                AND t.name NOT LIKE '%temp%'                AND t.name NOT LIKE '%Log%' -- log tables                                -- Ignore some columns                AND c.name NOT IN ('GLCode', 'EID', 'AID') -- external keys        )ORDER BY        t.name,        c.name

Using this script, I found over 200 missing foreign keys in one production database!