Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules
A common pattern for checking an entity is valid involves testing a number of rules on it. After all tests have been performed, a list of broken rules is returned.
Consider this example for validating instances of a simple Customer class:
class CustomerValidator{ public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c) { IList<RuleViolation> ruleViolations = new List<RuleViolation>(); if (String.IsNullOrEmpty(c.FirstName)) { ruleViolations.Add(new RuleViolation("FirstName", "First name cannot be empty.")); } if (String.IsNullOrEmpty(c.LastName)) { ruleViolations.Add(new RuleViolation("LastName", "Last name cannot be empty.")); } if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+")) { ruleViolations.Add(new RuleViolation("PhoneNumber", "Invalid phone number.")); } return ruleViolations; }}
Quite often though, we don’t care about the full list of broken rules — we only care if the object is valid or not. Instead of…
IEnumerable<RuleViolation> brokenRules = customerValidator.GetAllRuleViolations(customer);if (brokenRules.Count() > 0) // do stuff</pre>...we would rather have:</p><pre lang="csharp">if (!customerValidator.IsValid(customer)) // do stuff
So what’s the difference between checking if an entity is valid and getting a detailed list of validation errors?
For starters, the only way of finding out if an entity is valid is by testing all the rules against it. Let’s assume this is a reasonably intensive operation — if you have a lot of rules, or need to check things with external systems (checking a username doesn’t already exist in the database, for example).
If all we’re doing is checking if the entity is valid, we want to do as little work as possible. This means stopping as soon as we hit a broken rule.
The easiest way to do this is with the yield return keyword. Yield return is kind of strange — it lets you iterate over objects as they are returned from a method. This is used for evaluating LINQ expression trees. Instead of filtering and reducing a collection one criteria at a time — e.g. testing 1000 objects, then re-testing the 500 objects that passed, etc — it tests each object against all the criteria at once.
In this case, we simply want to return as soon as a broken rule is encountered.
class CustomerValidator{ public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c) { if (String.IsNullOrEmpty(c.FirstName)) { yield return new RuleViolation("FirstName", "First name cannot be empty."); } if (String.IsNullOrEmpty(c.LastName)) { yield return new RuleViolation("LastName", "Last name cannot be empty."); } if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+")) { yield return new RuleViolation("PhoneNumber", "Invalid phone number."); } }}
See that? The method is still defined as returning a collection, but it has three return statements with a single object each. With this, we can use a little bit of LINQ to break out of the method early:
public bool IsValid(Customer c) { // are there any rule violations? return !GetAllRuleViolations(c).Any(); }
I whipped up a little test app to prove this — IsValid vs GetAllRuleViolations.
CustomerValidator validator = new CustomerValidator();// An invalid customerCustomer customer = new Customer(){ FirstName = "", LastName = "", PhoneNumber = "asdsd"};// do as little work as possible to work out if the entity is valid or notConsole.WriteLine("IsValid = ");Console.WriteLine(validator.IsValid(customer));// check everything for a full reportConsole.WriteLine("GetAllRuleViolations().Count =");Console.WriteLine(validator.GetAllRuleViolations(customer).Count());
Here’s what it produces. Note that IsValid returns immediately after the first rule fails.
IsValid = Testing first nameTrueGetAllRuleViolations().Count = Testing first name Testing last name Testing phone number3