Merge redundant assemblies

Lately I have become a big opponent of a popular anti-pattern: people insisting on splitting up their application tiers/layers into 5-10 separate Visual Studio projects and adding references between them. Double that number of projects if you want corresponding unit test project for each layer.

In fact, removing them has become one of the first steps I take when inheriting a legacy code base. If I were writing a book on refactoring Visual Studio solutions, I would call it Merge redundant assemblies. Here’s a diagram:

You don’t need to split your code across 50 gazillion projects. Next time you think of creating a new project in your solution, please remember the following:

  • Visual Studio projects are for outputing assemblies. Namespaces are for organising code.
  • Assemblies only need to be split if your deployment scenario demands it. Putting a client API library into a separate assembly makes sense because the same API assembly may be used between many apps, or in different App Domains. Deploying your domain model or data layer into a separate assembly does not make sense, unless other apps need them too.
  • Each additional project slows down your build. A giant project with hundreds of classes will compile faster than a smaller number of classes split amongst multiple projects.
  • Crossing assembly boundaries hurts runtime performance. Your app will start up slower, the ability to perform inlining and OS optimization is reduced, and additional security overhead is enforced between assemblies. Assemblies are supposed to be big and heavy; loading lots of little one goes against the CLR.
  • You don’t need one test project for each assembly. One giant tests project is normally fine. The only case I have seen where it made sense to have separate test projects was for a client API which duplicated many of the server internal class names and we wanted to avoid overlap/namespace pollution.
  • Common sense should be used to enforce one-way dependencies. Not assembly references.

Patrick Smacchia has a good list of valid/invalid use cases where separate assemblies are appropriate here.

Are your applications ‘legacy code’ before they even hit production?

Are your applications ‘legacy code’ before they even hit production?

When a business has custom software developed, it expects that software to last for a long time. To do this, software must A) continue operating — by adapting to changing technology needs, and B) continue to be useful — by adapting to changing business needs.

Businesses’ needs change all the time, and all but the most trivial of software will require modification beyond its original state at some point in its life, in order to to remain useful. This is sometimes known as brownfield development (as opposed to greenfield), and from my experience, accounts for a good two thirds to three quarters of all enterprise software development.

Anyway, time for a story. A company has a flash new business intelligence system developed (in .NET) to replace an aging mainframe system. Everything goes more-or-less to plan, and the system is delivered complete and on time. It’s robust and works very well — everyone is satisfied.

Now, six months have passed since the original go-live date, and the owners have a list of new new features they want added, plus a few tweaks to existing functionality.

Development begins again. After a few test releases, however, new bugs start to appear in old features that weren’t even part of phase two. Delivery dates slip, and time estimates are thrown out the window as search-and-destroy-style bug fixing takes over. Developers scramble to patch things up, guided only by the most recent round of user testing. Even the simplest of changes take weeks of development, and endless manual testing iterations to get right. The development team looks incompetent, testers are fed up, and stakeholders are left scratching their heads wondering how you could possibly break something that was working fine before. What happened?

The illusion of robustness

The robustness of many systems like this is an illusion — a temporary spike in value that’s only reflective of the fact it has been tested, and is known to work in its current state. Because everything looks fine on the surface, the quality of source code is assumed to be fine as well.

From my experience, a lot of business source code is structured like a house of cards. The software looks good from the outside and works perfectly fine, but if you want to modify or replace a central component — i.e. a card on which other cards depend — you pretty much have to re-build and re-test everything above it from scratch.

Such systems are written as legacy code from the very first day. By legacy, I mean code that is difficult to change without introducing bugs. This is most commonly caused by:

  • Low cohesion – classes and methods with several, unrelated responsibilities
  • High coupling – classes with tight dependencies on each other
  • Lack of unit tests

Low cohesion and high coupling are typical symptoms of code written under time-strapped, feature-by-feature development, where the first version that works is the final version that goes to production.

Once it’s up and running, there’s little motivation for refactoring or improving the design, no matter how ugly the code is — especially if it will require manual re-testing again. This is the value spike trap!

The tragedy of legacy code

Back to the story. If a developer on the project realises the hole they’ve fallen into, they can’t talk about it. This late in the process, such a revelation could only be taken in one of two ways — as an excuse, blaming the original designers of the software for mistakes made today — or as a very serious admission of incompetence in the development team from day one.

The other developers can’t identify any clear problems with the code, and simply assume that all brownfield development must be painful. It’s the classic story — when granted the opportunity to work on a greenfield project, you’re much happier ó unconstrained by years of cruft, you’re free to write clean, good code. Right? But if you don’t understand what was wrong with the last project you worked on, you’ll be doomed to repeat all of its mistakes. Even with the best of intentions, new legacy code is written, and without knowing it, you’ve created another maintenance nightmare just like the one before it.

Clearly, this is not a good place to be in — an endless cycle of hard-to-maintain software. So how can we break it?

Breaking the cycle

I believe that Test Driven Development (TDD) is the only way to keep your software easy-to-modify over the long-term. Why? Because:

  1. You can’t write unit tests for tightly-coupled, low cohesive code. TDD forces you to break dependencies and improve your design, simply in order to get your code into a test harness.
  2. It establishes a base line of confidence — a quantifiable percentage of code that is known to work. You can then rely on this to assert that new changes haven’t broken any existing functionality. This raises the value of code between ‘human tested’ spikes, and will catch a lot of problems before anyone outside the development team ever sees them.

TDD gives you confidence in your code by proving that it works as intended. But object-oriented languages give you a lot freedom, and if you’re not careful, your code can end up as a huge unreadable mess.

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” – Martin Fowler

So how do you write code that humans can understand? Enter TDD’s partner in crime, Domain Driven Design (DDD).

DDD establishes a very clear model of the domain (the subject area that the software was written to address), and puts it above all technological concerns that can put barriers of obfuscation between the business concepts and the code used to implement them (like the limitations of a relational database schema). The end result is very expressive entities, repositories and service classes that are based on real life concepts and can be easily explained to non-technical business stakeholders.

Plus, DDD provides a clear set rules around all the fiddly stuff like relationships and responsibilities between entities. In short, DDD is simply object-oriented code done right.

Putting it to practice

So how do you get on to a point where you can use all this? There’s a lot to learn — an entire universe of information on top of your traditional computer science/software development education. I’ve only scratched the surface here in trying to explain why I think it’s important, but if you want to learn more:

  • Read books. How many books on software development have you read this year? Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans is pretty much the DDD bible, and, just like GoF, should be mandatory reading for any serious developer. If you’re stuck in legacy code hell right now though, have a read of Michael Feathers’ Working Effectively with Legacy Code. It’s all about breaking dependencies and beating existing code into test harnesses so you can get on with improving it without fear of unintended consequences.
  • Read blogs. TDD/DDD is a hot subject right now, and blogs are great for staying on top of all the latest tools and methods. They’re also a good place for learning about TDD/DDD’s support concepts like dependency injection, inversion of control, mocking, object/relational mapping, etc.
  • Join mailing lists — DomainDrivenDesign and ALT.NET (if you’re that way inclined) are two good ones with a reasonably high volume of great intellectual discussion. You can gain a lot of insight reading other people’s real-life problems, and watching the discussion evolve as potential solutions are debated.
  • Practise. Grab an ORM, a unit test framework/runner, an IoC container and start playing around.

I believe the only way to make software truly maintainable long-term (without becoming legacy code) is to use TDD and DDD. Together, they’re all about building confidence — confidence that you can make changes to a system without introducing any bugs. Confidence that other people can understand your code and the concepts it represents. And most importantly, confidence that you’re learning, and aren’t making all the same mistakes every time you start a new project.

Troubleshooting Windows module dependencies

Troubleshooting Windows module dependencies

I have just got a new computer at work, and over the past week I have been installing all the software that I like to use. One tool I rely on is IISAdmin, a program that sits in the system tray and allows you to run multiple IIS websites under non-server editions of Windows. Unfortunately, when I tried to install it under Window Vista, it failed with the message “Error 1904. Module C:Program Filesiisadminztray.ocx failed to register. HRESULT -2147024770. Contact your support personnel.”

IISAdmin setup error 1904

Attempting to manually register the module with RegSvr32 didn’t work either:

The module "ztray.ocx" failed to load.

I couldn’t find anything searching for either of the error messages, so I did some digging on the module itself. It turns out that ztray.ocx is an old (1997) ActiveX control that allows programs to add an icon to the system tray.

I eventually found Dependency Walker, a tool that scans Windows modules for dependencies. I opened up ztray.ocx to see if it was missing anything.

Viewing ztray.ocx with Dependency Walker

As you can see, ztray.ocx makes calls to a module called msvbvm50.dll, part of the Visual Basic 5.0 run-time. This package is present in Windows XP, but appears to have been dropped from Vista. Luckily it is still available as a free download from Microsoft. Installing it solved the dependency problem, and I was able to install IISAdmin successfully.

Generating documentation with Doxygen

Generating documentation with Doxygen

A few days ago, I had my first experience with Doxygen, an open-source documentation generator similar to Javadoc.

After a few hours of documenting my code to a satisfactory level, I had a very professional-looking set of HTML documentation. At work the next day, I used Doxygen to generate code for a Visual C# class library, which had been documented with XML comments.

Doxygen supports most C-derived programming languages including C, C++, Java and C#. It also allows you to define custom pages and sections; you are not limited to code documentation.

Documenting a method is very simple:

/// @brief Calculate the sum of two values.////// Adds the two values together and returns their sum.////// @param[in] aa The first value./// @param[in] bb The second value.////// @return The sum of both values.////// @note This method is deprecated!////// @see operator+()int add(int aa, int bb);

A full list of Doxygen commands is available.