Failsafer software in practice

As mentioned earlier, Jules May held a workshop at SDD Conf recently, on how to write failsafe software. Having a system where stability is crucial, I went through its source code from top to bottom to see how it could be improved using his ideas. As the programming language here is C, things like having null objects were not fully applicable. Many other things were, though.

Most functions already started by checking for null pointers, invalid indices etc, but this could now be taken one step further. For example, some functions had a body like this:

if (someOtherFunction()) {
// 5-20 lines of code, representing the "happy path"
} else {
// log an error, and return null or similar
}

By inverting the test, the failing case got closer to the test. As this branch always ended with a return, the happy path could be outdented one level. A few functions had several of these constructs after each other, which were all rewritten the same way. The rest of the function was just the straight happy path.

In some cases the "else" branch did not return, but instead just did something other than the "then" branch. When there was code after this "if" statement, we could not just return. The solution was then to move this "if" statement into a separate function, where the branch with the least amount of code easily could return afterwards. The original function then became just "oneThing(); anotherThing();". It is now obvious that anotherThing() is always done after oneThing() here, regardless of what oneThing() is doing. When the "oneThing()" part is long, it is otherwise easy to throw in a "return" somewhere, which would be incorrect. Now that potential bug is impossible.

The code previously looked like this:

func() {
if (condition) {
// do step 1A, lots of lines
} else {
// do step 1B, 1 line
}
// do step 2
}

This I now turned into:

oneThing() {
if (!condition) {
// do step 1B, then return
}
// do step 1A
}

func() {
oneThing()
// do step 2, perhaps in anotherThing()
}

With the pattern above, one can move memory allocations and locks up from oneThing() to the original function func(). The matching calls to malloc() and free() then get closer together, and the same goes for lock/unlock. It becomes easier to verify that they pair up correctly, eliminating two entire classes of bugs.

There were also a couple of places where "createStructFoo()" functions created semi-invalid structures, which required that one out of a handful other functions was called afterwards before the structure was fully functional. For example, a "KeyValue" struct requires both the key and the value to be set, before anything else can be done on it. So, these functions were now merged into a "createStructFoo(key, value)" function, avoiding the invalid state altogether.

The simpler code revealed a few issues, including one case where a couple of unnecessary SQL requests were made. So, the code is now easier to read, harder to get to crash, and ever so slightly faster. The number of lines of code shrunk by about 150. This was far from the first "top to bottom" code review I've done on this code, but the new, very specific focus, still found ways it could be improved.