Some time ago I went through the full code base to EMG, marking pointers received as function parameters as pointing to const data whenever possible. This led to some refactorings, primarily splitting many functions into two. One allocated a struct and filled it with values in the beginning, and cleared everything at the end. In the middle it called a second function, which used the prepared data. When reading the code, it is now obvious that none of these values will be modified in the second function. It can also do "early exit" without any risk of memory leaks. In some places it became obvious that the data could be stored on the stack instead of on the heap, resulting both in faster code and fewer places that can leak memory.
As a followup project, I recently started a second phase with two main goals:
As a followup project, I recently started a second phase with two main goals:
- Declare variables as late as possible, instead of in the beginning of each function (strictly speaking, in the beginning of a set of statements surrounded by curly braces). This has been supported in C for quite some time now, and it finally got too boring scrolling through tons of variable declarations before getting to some actual code.
- Declare pointers as const, using the "typeName* const variableName" syntax. The similar syntax "const typeName* variableName" says that the value the pointer points to will not be changed, but by having const right before the variable name instead, it says that the pointer itself will not be changed. These can of course be combined.
The first one found a bug: A function started with "int len = 0;", did some string manipulations, and then had "return len;" at the end. When moving the declaration down to its first usage, it turned out this variable was never actually updated. This was only a problem when several specific and unusual conditions were met, so it has never been identified in production. Still, it is of course corrected now.
The C syntax, except one of the very most recent versions, have an interesting exception to the rule that variables can be declared anywhere you can have a statement: You can not declare a variable directly after a label. So, when this has been desirable, I've wrapped the code after the label within curly braces. Having "case 0: { int x = 0; ... }" is completely fine in any version of C. However, this is a bit ugly and messes up the indentation, and there is actually a nicer version: "case 0: ; int x = 0; ... ". By adding a semicolon directly after the label, there is no problem declaring a new variable on the next line.
As the variables got moved down, any initial parameter checks leading to early exits became more visible as they were now at the very top of the function. It also made it clear that some initializations could be postponed. Most of these could be moved around automatically by the compiler, but not when the values came from a function call which may have side effects. It's perhaps not even measurable, but it feels good that the code is now ever so slightly faster than before.
The combination of the two goals greatly improved the readability of the code. Instead of this (the code was written some 25 years ago, when this type of thing was popular):
typeName variableName; ... if ((variableName = function()) == someValue) { }
we now get this, which is much nicer:
typeName const variableName = function(); if (variableName == someValue) { }
It has the same number of lines, but there is only one thing happening on each one. It also reduces the line length of the if statement, making it easier to read.
Another fun one is this:
typeName variableName = value1; ... if (condition) variableName = value2;
This can now instead be written like this:
typeName const variableName = condition ? value2 : value1;
It is now clear that this variable will never be updated, and both its possible values are in one place.
Not too long I learned that C structs can be initialized in their declaration. All fields not mentioned are automatically set to 0. So instead of this mess:
structName variableName; memset(&variableName, 0, sizeof(variableName)); variableName.field1 = value1; variableName.field2 = value2; ...
we can do this, avoiding lots of potentially problematic repetitions of the variable name:
structName variableName = { .field1 = value1, .field2 = value2, ... };
In both cases new fields can be added to the struct without any risk of uninitialized values. However, the new syntax only works if all values are known when the variable is declared. As this now happens much later, these are now much more common. Many of them could also be marked as const.
/Daniel