Mandatory overloads

Mandatory overloads

Yesterday I read a pretty sensible tweet by Jeremy D. Miller:

(args), you almost always need a Method(Type, args) too.’ width=’550′ height=’292′>

In spirit, I would like to propose another:

When you have a method like Method(IEnumerable<T> items), you should always provide a Method(params T[] items) too.

Duct-tape programmers ship… once

Duct-tape programmers ship… once

I read an interesting article today full of praise for duct-tape programmers, a breed of developers who are smart enough to get away without unit tests, and ship code early by using duct-tape style programming techniques.

Duct-tape programmers are considered great because they help companies take first-mover advantage, while others waste time developing abstract frameworks and trying to over-engineer everything. Customers and project managers love duct-tape programmers because they get the job done fast.

I see duct-tape programmers in a different way:

Of course you need to ship products fast to beat your competition. But duct-tape isn’t a sustainable way to build software, unless you have the luxury of never having to touch the code again (one-off console games come to mind here). Otherwise, maintenance costs from brittle code will soon come back to bite and affect your ability to ship version 2.0 in time.

Our challenge is to find a balance in the middle where we aim high, but have an architecture flexible enough to allow duct-tape-style compromises in a clean and isolated fashion when required.

Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules

Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules

A common pattern for checking an entity is valid involves testing a number of rules on it. After all tests have been performed, a list of broken rules is returned.

Consider this example for validating instances of a simple Customer class:

class CustomerValidator{    public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c)    {        IList<RuleViolation> ruleViolations = new List<RuleViolation>();        if (String.IsNullOrEmpty(c.FirstName))        {            ruleViolations.Add(new RuleViolation("FirstName",                 "First name cannot be empty."));        }        if (String.IsNullOrEmpty(c.LastName))        {            ruleViolations.Add(new RuleViolation("LastName",                 "Last name cannot be empty."));        }        if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+"))        {            ruleViolations.Add(new RuleViolation("PhoneNumber",                 "Invalid phone number."));        }        return ruleViolations;    }}

Quite often though, we don’t care about the full list of broken rules — we only care if the object is valid or not. Instead of…

IEnumerable<RuleViolation> brokenRules =     customerValidator.GetAllRuleViolations(customer);if (brokenRules.Count() > 0)    // do stuff</pre>...we would rather have:</p><pre lang="csharp">if (!customerValidator.IsValid(customer))    // do stuff

So what’s the difference between checking if an entity is valid and getting a detailed list of validation errors?

For starters, the only way of finding out if an entity is valid is by testing all the rules against it. Let’s assume this is a reasonably intensive operation — if you have a lot of rules, or need to check things with external systems (checking a username doesn’t already exist in the database, for example).

If all we’re doing is checking if the entity is valid, we want to do as little work as possible. This means stopping as soon as we hit a broken rule.

The easiest way to do this is with the yield return keyword. Yield return is kind of strange — it lets you iterate over objects as they are returned from a method. This is used for evaluating LINQ expression trees. Instead of filtering and reducing a collection one criteria at a time — e.g. testing 1000 objects, then re-testing the 500 objects that passed, etc — it tests each object against all the criteria at once.

In this case, we simply want to return as soon as a broken rule is encountered.

class CustomerValidator{    public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c)    {        if (String.IsNullOrEmpty(c.FirstName))        {            yield return new RuleViolation("FirstName",                 "First name cannot be empty.");        }                if (String.IsNullOrEmpty(c.LastName))        {            yield return new RuleViolation("LastName",                 "Last name cannot be empty.");        }        if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+"))        {            yield return new RuleViolation("PhoneNumber",                 "Invalid phone number.");        }    }}

See that? The method is still defined as returning a collection, but it has three return statements with a single object each. With this, we can use a little bit of LINQ to break out of the method early:

    public bool IsValid(Customer c)    {        // are there any rule violations?        return !GetAllRuleViolations(c).Any();    }

I whipped up a little test app to prove this — IsValid vs GetAllRuleViolations.

CustomerValidator validator = new CustomerValidator();// An invalid customerCustomer customer = new Customer(){    FirstName = "",    LastName = "",    PhoneNumber = "asdsd"};// do as little work as possible to work out if the entity is valid or notConsole.WriteLine("IsValid = ");Console.WriteLine(validator.IsValid(customer));// check everything for a full reportConsole.WriteLine("GetAllRuleViolations().Count =");Console.WriteLine(validator.GetAllRuleViolations(customer).Count());

Here’s what it produces. Note that IsValid returns immediately after the first rule fails.

IsValid =        Testing first nameTrueGetAllRuleViolations().Count =        Testing first name        Testing last name        Testing phone number3

Back to basics: good source control check-in habits

Back to basics: good source control check-in habits

The other day at work I went over a few good source control habits (TFS-centric) for new developers, and why they’re worth doing. Here are some tips:

Check-ins are coarse-grained

When I first started using source control, a lot of my check-in logs looked like this:

  1. Added useful function foo() (that will be ultimately required for feature X)
  2. Added feature X

I was trying to be helpful by making foo() available for use as early as possible, but most of the time I checked it in too early, and needed to change/fix it later — resulting in additional check-ins. In reality though, all this did was clutter up the change history with signal-to-noise, making it harder to establish what changes were actually required to effect a change.

It also of course violates the YAGNI (you ain’t gonna need it) principle – no one ever needed foo() badly enough to make it worth checking it in early.

Check-ins should be coarsely grained — instead of staggering code changes over a period of time, save it all for one big changeset that contains everything. This actually improves YAGNI — instead of adding endless low-level functions ‘because they might be useful’, you’re forced to associate them with a higher application-level change.

Never check in unfinished work

In the main (trunk) branch, there are two cardinal rules:

  • Never check in anything that breaks the build
  • Never check in an unfinished feature

The main branch should always be in a state where it’s ready for release. Features have been entirely added or don’t exist yet; complete changes have been made, or not even started. There is no middle ground.

Having to worry about the possibly unfinished state of the ‘master copy’ is just another burden on your team. Even worse, if you check in code that doesn’t compile, you can interrupt other people’s work and waste their time commenting out bad blocks and working around it.

Sometimes, however, you’ll get a piece of work that doesn’t fit in one check-in.

  • It could take a long time to implement (needs more than one check-in).
  • Multiple developers might need to work on it.
  • The change isn’t going to be released (committed to the main branch) until a later date.

You have two options: fork a copy of the code for a new development branch, or shelve your changes so you or someone else can work on them later. The basic rule for branches is if you need more than one check-in to make a change, you need to branch and merge back in later.

Source control is not for back ups

This follows from the previous two tips. Check in your work because you’re completely done and finished — not because it’s the end of the day, or because it’s the weekend etc.

I think this habit often begins when using reserved-checkout source control systems like VSS, where having a file checked out explicitly blocks other people from editing it. Everyone who’s used VSS knows from experience that hilarious situation where you need to edit a checked-out file, but the person who has it is on holiday or otherwise unreachable.

VSS file exclusively checked out error

To avoid this, teams adapt their work habits to suit VSS, checking in frequently so as not to lock files for extended periods. This practice is continued for some reason (cargo cult programming?) when moving to less brain-damaged source control tools like TFS or subversion.

Anyway, if you want nightly backups of your local working source code, get some backup software. Don’t use source control — change tracking becomes difficult when you check in based on calendar events. It gets even more ridiculous if you have changes that don’t compile yet, and have to comment/uncomment code when checking it in.

Use a separate check-in when reformatting code

Say you go to edit a file, but the formatting is all messed up — the indenting is wrong. So you fix it up while making your changes. But then, if someone comes along later to see what your change entailed, a diff tool will highlight differences on every line, making it very hard to see what the actual code change was. Instead, use a separate check-in for mass code reformatting.

Write useful check-in comments

If you’re looking over the history of a file in source control, comments like ‘css fix’ or ‘#12345′ won’t be very useful. Instead, good check-in comments need three things:

  • What was changed
  • Why it was changed
  • Any bug/issue tracking numbers (if applicable)

Describe what you changed, and why it was necessary. You don’t have to write an essay — I find one sentence of ‘did X because Y’ is usually sufficient.

TFS change history

You should also list any bug/issue tracking numbers, or make reference to any relevant project documentation (e.g. list of requirements) to provide a higher-level context. This is a good way of establishing traceability on tightly-controlled source repositories — if you do it right, you’ll find almost every change can be traceable back to some external identifier.

I also like to paste my changeset number when I update an issue as being completed (ready for user testing or peer review) — the easier it is to find my changes, the better! (Note that some source control tools can do this automatically from your check-in comment).

Real-life examples

You can see good check-in habits in action in most popular open source projects. Here are a few of examples:

  • Boost C++ Libraries revision log
  • Ruby on Rails commit history
  • Subtext revision log

Implementing access control in a tiered application

Implementing access control in a tiered application

Recently I have been working on a reasonably-large ERP system that supports around 2,000 users. The system has approximately forty access control groups to which individual users can belong. Each of these groups grants or restricts access to parts of the system. For example, a user may be allowed to edit the price of a product if they are in the SalesManagers group. If they are in the InventoryManagers group, they can edit the number of items in stock. If they are in the Admins group they can do everything.

public class User{        // A list of groups the user belongs to.        public List<string> MembershipGroups { get; };         // ... etc}

To handle all these access groups and their associated rules, a form page’s load method contained several screenfulls of business logic to determine which elements should be editable by the user and which should not.

Here’s a simple example of what it might look like on an EditProduct page:

public class EditProductPage : Page{        // Form elements.        private TextBox ProductName;        private TextBox Price;        private TextBox NumberInStock;         public void InitializeForm()        {                if (CurrentUser.MembershipGroups.Contains("Admins"))                {                        // Admins can change anything.                        ProductName.Enabled = true;                        Price.Enabled = true;                        NumberInStock.Enabled = true;                }                else if (CurrentUser.MembershipGroups.Contains("SalesManagers"))                {                        // SalesManagers can change the product's price.                        ProductName.Enabled = false;                        Price.Enabled = true;                        NumberInStock.Enabled = false;                }                else if (CurrentUser.MembershipGroups.Contains("InventoryManagers"))                {                        // InventoryManagers can change the inventory levels.                        ProductName.Enabled = false;                        Price.Enabled = false;                        NumberInStock.Enabled = true;                }                else                {                        // Regular users can't do anything.                        ProductName.Enabled = false;                        Price.Enabled = false;                        NumberInStock.Enabled = false;                }                 // ... other complex business logic        }}

Now this is all good and well, but what happens if someone is a member of SalesManagers and InventoryManagers? Or if we add rules related to editing certain types of products? What happens if we need to make another form? This code will grow exponentially and start to spaghetti out of control.

I believe there are two fundamental problems with this design. Firstly, it is not clear what the different access groups mean in the context of the form. For example, if you’re a member of the InventoryManagers group, which fields should be editable and which should be locked? These rules are probably defined clearly in a requirements document somewhere, but are difficult to understand from reading the source code.

The second problem is that these rules are implemented at each point they are consumed, i.e. in the presentation layer. Each form is free to access and interpret what each group means. This logic clearly don’t belong here: a form page should only know how to hide and show elements! On the other hand, it seems this responsibility doesn’t lies exclusively with the business layer either, as it relates to form-specific elements. So where should it go?

In a well-designed application, the relationship between group membership and access privileges would not be exposed to the UI. Instead, we should define a list of easily-understood permission flags that have direct meaning to consumers of our model. Each of these flags will explicitly define what a user can and cannot do. They must be as simple to understand as possible.

public class AccessPermissions{        // Correct: these flags can be applied directly to a form.        public bool CanEditName { get; };        public bool CanEditPrice { get; };        public bool CanEditNumberInStock { get; };         // Incorrect: this flag is too vague.        bool IsAdmin { get; };}

We can then add these permissions to our User class:

public class User{        // A list of groups the user belongs to.        public List<string> MembershipGroups { get; };         // The user's access permissions.        public AccessPermissions Permissions { get; };         // ... etc}

Setting these flags may involve complex and conditional business logic, which is now removed from the presentation layer. These simplified flags can then be directly used by the form’s Supervising Controller:

public class ProductEditFormController : ISupervisingController         // The passive view for the current form.        private IEditFormView Form;         // Set up form state.        public void InitializeForm()        {                AccessPermissions permissions = CurrentUser.Permissions;                 if (permissions.CanEditName)                        Form.ProductName.Editable = true;                 if (permissions.CanEditPrice)                        Form.Price.Editable = true;                 if (permissions.CanEditNumberInStock)                        Form.NumberInStock.Editable = true;                 // ... etc        }}

Privileges should be separated amongst multiple classes, specific to different areas of functionality. For example, you might provide a UserPermissions class for a user management form:

// Permissions related to user management.public class UserPermissions{        bool CanEditName;        bool CanEditMembershipGroups;        bool CanDeleteUser;}

By removing access level-related logic from the point of consumption and declaring them as explicit, immediately-usable flags, we can make it much easier for developers to consistently apply access control. Removing potentially complex business logic from the presentation layer cuts down on code duplication (and bugs). The overall quality of the system improves.