Why Documentation is Mandatory for Maintainable Software [shorts #15]

Software engineering is more than just writing source code. Like any other engineering discipline, the resulting product should adhere to the specifications, be verifiable, and maintainable. However, one more aspect is crucial to every engineering practice: Documentation, which constitutes more than just the handbook delivered to customers or users. Read on to learn how this vital aspect helps other developers understand the source code more quickly and how it ensures repeatability and a deterministic engineering process.

Documentation Starts With Clean Code Guidelines

Software documentation entails more than just comments or static text. You should remember that the main goal of software documentation is to ensure that other devs (and you) can understand a code fragment as easily as possible. The faster and easier it is to understand, the easier it is to maintain and debug, should it become necessary later.

Adhering to only a few straightforward clean-code practices already drastically improves code quality and maintainability. If you just have time to focus on one thing, it should be choosing descriptive names for variables and methods. Let’s look at an example:

@Override
public boolean isValid(String s, boolean nullable) {
    return s == null && nullable || s != null && s.length() == 2 && CODES.contains(s);
}

From this snippet alone, it would be relatively difficult to see what it does. It’s some sort of validation function for a String variable. But without additional context, it would be fairly difficult to grasp at a glance. Let’s improve the snippet a bit by adding brackets and renaming CODES:

@Override
public boolean isValid(String s, boolean nullable) {
    return (s == null && nullable) || (s != null) && (s.length() == 2) &&
        VALID_COUNTRY_CODES.contains(s);
}

Ok, so now it becomes clear that we’re dealing with two-letter country codes. But still, the if-statement could benefit from being split up:

@Override
public boolean isValidCountryCode(String code, boolean nullable) 
    final boolean isNullAndNullable = (code == null && nullable)
    final boolean isNotNullAndValid = (code != null) && (code.length() == 2) && 
        VALID_COUNTRY_CODES.contains(code);
    return isNullAndNullable || isNotNullAndValid
}

The final snippet makes it very easy to understand that a code is valid if it’s null and nullable or if it is not null and valid. Valid means that it has two characters and appears in the list of allowed codes.

The first snippet is significantly shorter, but software design is not about saving characters. The faster you and other programmers can understand your code, the better. The three snippets are functionally identical (the order of operations is implicitly clear: && takes precedence over

). However, the initial version is much more difficult to read (and make sense of semantically) than the named version (or even the one with superfluous brackets). Don’t worry about writing short code – The compiler will take care of that for you, anyway! :)

There’s a slight caveat: If the second boolean expression took a long time to execute, it might be worth returning immediately instead of performing the check at the end to save CPU time.

Make Comments Count

The second important thing to remember is that comments should only be used where they enhance the code. Too many comments add noise, and there’s no reason to state the obvious. Instead, keep it short and sweet: Focus on side effects, potential return values and what they mean, input value contracts, exceptions and why they might be thrown, and so on.

I believe a function should do one thing, and it may be worth splitting if it does too many things. Each function is more or less atomic, and its name should already be a perfect descriptor of what it does. So, in most cases, adding comments does not help unless some restrictions are not obvious when looking at the function as a black box. Let’s look at a real-world example. Imagine you called the following function that comes as part of an asset-management software:

final BaseAsset firstMatch = assetService.findFirstByTicker("AAPL");

While the name already mentions that the function returns the first match, it doesn’t state how it decides what should be the first match. It also doesn’t make it clear what happens if no assets match. However, looking at the interface’s comments helps clear things up:

/**
 * Returns any asset with any type or state with the matching ticker. If multiple assets match, the
 * one with the lowest ID is returned. If no match exists, the function returns null. The asset ticker is sanitized
 * using {@link #sanitizeTicker(String)} before the search.
 *
 * @param ticker The non-null ticker value to find
 * @return Any matching asset or null if none exists
 */
public BaseAsset findFirstByTicker(@NonNull String ticker);

A-ha! From this description, we learn a few things: First, the function expects a non-null string as its argument. Second, it sanitizes input values, so the caller doesn’t have to take of that. Conveniently, the comment also mentions how it sanitizes the input values. Thirdly, we learn that an additional null-check on the return value is needed to handle nonexistent assets. And lastly, we know that if multiple assets match, the one with the lowest ID is returned.

What type of comments don’t help? The ones that state the obvious. For example:

/**
 * Returns the first match with the given ticker
 *
 * @param ticker The ticker to find
 * @return The first match
 */
public BaseAsset findFirstByTicker(@NonNull String ticker);

Or, another fan favorite:

public BaseAsset findFirstByTicker(@NonNull String ticker) {
    // Sanitize the search ticker
    final String searchTicker = sanitizeTicker(ticker);
    // Find the asset in the repository or return null if none matches
    return repository
        .find("ticker = ?1 order by id asc", searchTicker)
        .firstResultOptional()
        .orElse(null);
}

As stated before, if you use descriptive names, such comments become static noise, artificially bloating your code base and making it more challenging to maintain.

Documentation Evolves With the Code

You must remember that comments (and software documentation as a whole) are not just static text you write once and then forget. The documentation must evolve with the code to reflect its current state and specification. Therefore, they become a liability, especially when you have too many comments that only repeat the obvious.

So, if you take one thing from this article, it should be:

Choose descriptive names that explain themselves. Augment non-obvious details using comments. More information should be added mainly to the specification documents.

The tests are then derived from the specification, and the tests serve as an additional way of documenting the expected behavior.

The Bottom Line

Most of the code should not need additional comments — it should explain itself. If it’s too complicated, it may need refactoring into smaller units. Again, that does not mean that the specifications don’t need to be written down. Some technical problems fill hundreds of pages in a journal, and that’s also where more extensive technical details should be explained.