Monday, August 6, 2012

How valuable is documentation really?

"We have come to value ... working software over comprehensive documentation."

So states the Agile Manifesto. I was thinking about this when I ran across this article entitled, "I've Just Inherited an Application--Now What?" It's an essay on the value of documentation in understanding an unfamiliar system, and by that he means technical documentation: architectural-level diagrams, design documents, and so forth. (He even uses the term "code spelunking" which I promise I thought of before reading his blog!)

I admit to being a little perplexed by his premise. Most of my career has been involved in working on legacy applications of one form or another, and slowly over time I have come to the opposite conclusion. In most cases I benefited very little from existing documentation, and the value in it decreased inversely proportional to the complexity of the code base. Indeed, on a list of desired artifacts in code spelunking, documentation appears low on my list.

Don't misunderstand, that's not to say documentation can't be helpful. The Manifesto doesn't say any documentation, but comprehensive documentation; and its authors don't say it's not valuable at all, but simply that they value working software more. Documents that are well written and up-to-date can indeed save hours of effort trying to understand an application.

But that's really the problem, isn't it? How many times does the UML diagram you find on the company wiki--or worse, emailed to you by a junior developer--accurately represent the actual code base? And how many class or sequence diagrams have you examined that have so many lines, boxes, and arrows, that it's impossible to actually establish the relationships with any clarity?

Software is constantly mutating, assuming it isn't being end-of-lifed. Requirements change, technology evolves, developers join and leave the team; and most of the time you have limited warning in advance of these changes. I have had product owners swear on holy relics that a particular product requirement will remain static forever, only to have it change months later as market demands shift. I have been through migrations in database technology, application servers, build tools, and countless development frameworks. To be of any value in the future, documentation must be maintained in parallel with these changes.

Moreover documentation costs money. A handful of organizations may benefit from the employ of a technical writer skilled enough to both read the code and generate well formed explanations, but in most cases technical documentation is written and maintained by the people who write the code. That is, developers. The time a developer spends crafting a diagram describing a subsystem is time not spent developing working software.

And bless their hearts, they may be excellent coders but they tend to be terrible writers.*

So what do I propose for understanding a system? First, the code itself. The code should be its own documentation. Good comments, yes, but variable names, method names, and the very structure of the code should reveal its purpose. Actual code is less written than read, a lesson sometimes lost on those who create it. Applying and enforcing a common set of coding standards, conducting rigorous code reviews (preferably via paired programming), and fostering a commitment to technical excellence all go a long way toward creating code that is self-documenting.

However well-written the code may be, it can really only provide a snapshot view into an application, and usually higher levels of abstractions are helpful. For this purpose, tests serve a critical need as documentation on multiple levels. Unit tests, for example, should not only validate the code, but should be written as a set of uses cases for how a class or method is expected to behave. Automated acceptance tests that provide high-level descriptions of business logic are the gold standard of executable documentation. Following the principles of BDD, they should be written in such a way that any stakeholder can understand them. Their being automated means that they are always in sync, and always correct. A test failure means either the assumption of the behavior was wrong, or the code itself doesn't actually perform as expected.

In fairness, there are tools that can generate diagrams and documentation from the code (the original blog author's company makes some of them), and these can potentially have value in understanding an application's architecture, assuming the output is human consumable. In my experience, code that's too difficult to read will probably generate complex diagrams that are just as difficult to understand. But I'm willing to concede that if I'm code spelunking, more information is usually better than less.

The times I have found the most value in design artifacts are at the outset of projects. When greenfield code is being written, diagrams and documentation can help communicate ideas between developers and teams. A quick sequence diagram about how you expect components to work can be helpful in communicating exactly what you want written. However, the moment those designs become code, the design document itself becomes outdated. Automated testing should replace it as the primary documentation of the system.

And in the end, I think this is ultimately what that line that I quoted from the Manifesto means. Working software is its own best documentation.

*Irony alert: I'm a developer, and I'm writing a blog.