SQL Notifications: not very practical for large data sets

SQL Notifications: not very practical for large data sets

I ran into an interesting problem today with command-based SQL Notifications. We’ve recently introduced SysCache2 on a project as NHibernate’s level 2 cache provider, because of it’s ability to invalidate regions when underlying data changes, and I already wrote about some issues we had with it. Unfortunately we hit another road block today, this time with the queries for notification themselves.

Here’s the offending config:

<syscache2>    <cacheRegion name="Tasks" relativeExpiration="9999999">        <dependencies>          <commands>              <add command="SELECT ID FROM Task" connectionName="Xyz" />              <add command="SELECT ID FROM TaskUser" connectionName="Xyz" />              ...          </commands>      </dependencies>    </cacheRegion></syscache2>

Can you spot the bug here that will result in an unhandled exception in ISession.Get()?

Nope? Neither could I for most of this afternoon.

Query for notification timeout errors

The problem is that the Task and TaskUser tables have four and six million rows respectively. SELECT ID from TaskUser takes over 90 seconds to execute. At this speed, by the time we have re-subscribed to the query notification, new data would have already been written by other users.

Depending on your exact scenario, you have several options:

  1. Refactor the database schema to remove rows from these tables that aren’t likely to change.
  2. Accept the slow query subscription.
  3. Enable caching, but ignore changes from these tables.
  4. Limit the command to only cover rows that are likely to change, e.g. SELECT ID FROM Task WHERE YEAR(DueDate) = 2009.
  5. Disable level 2 cache for these entities entirely.

Accepting the slow query subscription only works if you have very infrequent writes to the table, where it is worth caching rows for a long time.

For us, the high frequency of writes to these tables means that we would be invalidating the cache region all the time, and limited sharing of data between users doesn’t give much benefit in caching. Also blocking a HTTP request thread for 90 seconds is not feasible. So we chose the last option and now don’t bother caching these tables at all.

By the way, while working on this problem, I submitted my first patch to NH Contrib that adds a commandTimeout setting to SysCache2.

RSS: What I’m reading

RSS: What I’m reading

Lately I’ve had a few people asking for resources and links to articles about TDD and DDD. I’m a big RSS junkie, so I thought I would just share the feeds I’m currently reading.

I mostly read programming blogs, but you can see a few hints of my hidden UX aspirations as well 🙂

  • 456 Berea Street
  • A List Apart
  • Ayende @ Rahien
  • CodeBetter.Com – Stuff you need to Code Better!
  • {codesqueeze}
  • Coding Horror
  • Coding Instinct
  • Devlicio.us
  • DotNetKicks.com
  • Hendry Luk — Sheep in Fence
  • HunabKu
  • James Newton-King
  • Jimmy Bogard
  • Joel on Software
  • Jon Kruger’s Blog
  • jp.hamilton
  • Lean and Kanban
  • Los Techies
  • Paul Graham: Essays title
  • programming – top reddit links
  • Random Code
  • Scott Hanselman’s Computer Zen
  • Scott Muc
  • ScottGu’s Blog
  • Seth’s Blog
  • Smashing Magazine
  • Thoughts From Eric » Tech
  • Udi Dahan – The Software Simplist title
  • UI Scraps
  • UX Magazine
  • xProgramming.com
  • you’ve been HAACKED

Domain entities vs presentation model objects

Domain entities vs presentation model objects

This is the first half of a two-part article. Read the second half here: Domain entities vs presentation model objects, part 2: mapping.

When putting domain driven design (DDD) principles to practice in a real application, it’s important to recognize the difference between your domain model and presentation model.

Here’s a simple real-life example from a little MVC application that displays a to-do list of tasks. The task domain entity is very simple, with an identifer, a name and an optional due date:

// task domain entitypublic class Task{    public int Id;    public string Name;    public DateTime? DueDate;    // ...etc}

Pretty much everything we want to know about this Task can be derived from its properties via Specifications. For example, an OverDueTaskSpecification will tell us whether or not the task is overdue by checking if the due date has already passed, and an UnscheduledTaskSpecification will tell us if the task is scheduled by checking if the due date is null.

However, when rendering the task to the user, the application’s view must remain dumb — a passive view — and cannot work this sort of stuff out for itself. It is not enough to simply pass a collection of domain entities to the view; all relevant details must be explicitly provided so the view has all the information it requires without having to do any work itself.

Together, all these UI-specific details form a presentation model object, which is effectively identical to a DTO in that it has only carries data and no methods or behaviour of its own (some people call them PTOs). Here’s what my application’s presentation model object for a task looks like:

// task presentation model objectpublic class TaskView{    public int Id;    public string Name;    public string DueDate;    public bool IsUnscheduled;    public bool IsOverDue;    public long SortIndex;}

The ID and name fields are exactly the same, but DueDate is now a string which will either hold a friendly-formatted date or ‘Anytime’ if the task is not scheduled. Unscheduled and overdue are now explicit flags so the view can immediately identify tasks that need special display like highlighting.

Most of the front-end user interaction in this application is implemented via JavaScript, so we need an index for sorting and appending tasks in the correct order. It needs to be simple to parse and fast to compare, so I chose a long integer which is resolved from the due date.

Note this is a very simple example with only one domain entity class which we just added some new properties to. Most real scenarios will require some degree of flattening aggregate object graphs into a single class, and hiding of fields which are not relevant.

In the big scheme of things, presentation model objects are defined in the application layer (see the Onion Architecture), where they are validated and mapped between domain entities. All public interfaces to the application layer are expressed in terms of presentation model objects, so domain entities do not leak into the UI layer.

In an ASP.NET MVC application for example, presentation model objects would be used as the Models for all views and action results.

Update: a friend asked me total why I don’t format DueDate in the view, as proper MVC separation of concerns would dictate. The reason is that as well as rendering them directly in the view when a page is requested, this application also sends presentation model objects to the client via AJAX. I decided that having one single place (instead of two) where tasks’ DueDates are formatted would make maintenance easier in the long run.

Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules

Entity validation and LINQ: Using yield return to optimize IsValid over a list of broken rules

A common pattern for checking an entity is valid involves testing a number of rules on it. After all tests have been performed, a list of broken rules is returned.

Consider this example for validating instances of a simple Customer class:

class CustomerValidator{    public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c)    {        IList<RuleViolation> ruleViolations = new List<RuleViolation>();        if (String.IsNullOrEmpty(c.FirstName))        {            ruleViolations.Add(new RuleViolation("FirstName",                 "First name cannot be empty."));        }        if (String.IsNullOrEmpty(c.LastName))        {            ruleViolations.Add(new RuleViolation("LastName",                 "Last name cannot be empty."));        }        if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+"))        {            ruleViolations.Add(new RuleViolation("PhoneNumber",                 "Invalid phone number."));        }        return ruleViolations;    }}

Quite often though, we don’t care about the full list of broken rules — we only care if the object is valid or not. Instead of…

IEnumerable<RuleViolation> brokenRules =     customerValidator.GetAllRuleViolations(customer);if (brokenRules.Count() > 0)    // do stuff</pre>...we would rather have:</p><pre lang="csharp">if (!customerValidator.IsValid(customer))    // do stuff

So what’s the difference between checking if an entity is valid and getting a detailed list of validation errors?

For starters, the only way of finding out if an entity is valid is by testing all the rules against it. Let’s assume this is a reasonably intensive operation — if you have a lot of rules, or need to check things with external systems (checking a username doesn’t already exist in the database, for example).

If all we’re doing is checking if the entity is valid, we want to do as little work as possible. This means stopping as soon as we hit a broken rule.

The easiest way to do this is with the yield return keyword. Yield return is kind of strange — it lets you iterate over objects as they are returned from a method. This is used for evaluating LINQ expression trees. Instead of filtering and reducing a collection one criteria at a time — e.g. testing 1000 objects, then re-testing the 500 objects that passed, etc — it tests each object against all the criteria at once.

In this case, we simply want to return as soon as a broken rule is encountered.

class CustomerValidator{    public IEnumerable<RuleViolation> GetAllRuleViolations(Customer c)    {        if (String.IsNullOrEmpty(c.FirstName))        {            yield return new RuleViolation("FirstName",                 "First name cannot be empty.");        }                if (String.IsNullOrEmpty(c.LastName))        {            yield return new RuleViolation("LastName",                 "Last name cannot be empty.");        }        if (!Regex.IsMatch(c.PhoneNumber, @"[d ()+-]+"))        {            yield return new RuleViolation("PhoneNumber",                 "Invalid phone number.");        }    }}

See that? The method is still defined as returning a collection, but it has three return statements with a single object each. With this, we can use a little bit of LINQ to break out of the method early:

    public bool IsValid(Customer c)    {        // are there any rule violations?        return !GetAllRuleViolations(c).Any();    }

I whipped up a little test app to prove this — IsValid vs GetAllRuleViolations.

CustomerValidator validator = new CustomerValidator();// An invalid customerCustomer customer = new Customer(){    FirstName = "",    LastName = "",    PhoneNumber = "asdsd"};// do as little work as possible to work out if the entity is valid or notConsole.WriteLine("IsValid = ");Console.WriteLine(validator.IsValid(customer));// check everything for a full reportConsole.WriteLine("GetAllRuleViolations().Count =");Console.WriteLine(validator.GetAllRuleViolations(customer).Count());

Here’s what it produces. Note that IsValid returns immediately after the first rule fails.

IsValid =        Testing first nameTrueGetAllRuleViolations().Count =        Testing first name        Testing last name        Testing phone number3