Timo's weblog: 02/2009

New responsibilities during the past year have been a great learning experience. The key learning is that now I really know how incompetent I am. I can’t wait to move again and learn how many more things I do really badly, or what would be even better, can't do at all. This is a brief story of one such finding during this joyrney.

For the past year we have focused on ATDD with our own framework written in Python. We have 200+ automated acceptance tests for the system. With unit tests we however have struggled. While we have over 100 (well, it’s a start) of them, without the exception of the latest ones they are not really meaningful.

What's different with the latest tests then? They focus on higher level. I’m not sure what these tests as programmer tests should be called, but a programmer test will do for now. I do believe unit tests should be focused when doing TDD, but, wait, wait, I have an excuse… The code is old. It has its dependencies, and while maybe not the worst case in the world, it is a pain to get something compiled in isolation. The code has responsibility based structure (or should have had), and this structure is expressed in the source code folder structure. Each of the responsible "modules", or folders, typically contain own task. A typical task looks something like this:

task_specific_inits();

for(;;) {
s = OS_wait_for_something();
switch(s) {
case 1:
do_something1(s);
break;
}
}

Sometimes do_something1(s) is inlined and you may get a bitter sweet taste of those infamous 1000+ line functions. Other times you are lucky and the whole high level event parsing is already done in own function, along with lines do_something_with_the_event_from_X(s). This function continues the handling with loooong switch case, hopefully just calling further functions.

So, when we decide to test something inside a selected "module", or a folder in our case, we compile and link single test file, all the production code from a single responsible module/folder, production code for everything considered utils, like linked lists etc., and fake everything else. For faking we use Atomic Object's Cmock and manually written stuff when appropriate. We choose the task handling for injecting the test actions.

We arrange the test execution environment as we wish by initializing all the parties to expected state and teaching the mocked neighbours accordingly. We inject a single event, or short sequence of events, into task's handling routine and we try to find ways to assert if everything went as we wished for. Sometimes we can use this to learn what really happens when you give such and such event. After all the default assumption is that the code works, as it has been in production for years. We want to make sure it stays that way, when we change it. We have several options for observing the behavior:

1. Automatically generated mocks will tell us if the interaction was as expected
2. We can use getters of utilities, like linked lists
3. We can sense the internal status of any of the production code files with few nasty little tricks like #define STATIC

When the first test, and maybe her friend, is running it is time to start refactoring your code. Refactoring your test code, that is. If you take a closer look on what you have done, you most likely see 1-2 300 lines long test cases, which look pretty much the same. Now it is a good time to start extracting helpers. When creating an event sequence to be run you probably generate similar data structures. These can be extracted into functions. You probably do a bunch of similar assertions on many of your test. These can be extracted to helper functions. And so on, and so on. Each refactoring is likely to reveal more opportunities for cleaning the code. This can't be emphasized more. It is important to keep the code clean from the beginning. Otherwise you will have a 10KLOC test file on your hands, and it is much more work to start cleaning it only at that point.

This is very far from TFD (test first design). It is a battle to get some tests going to be in better place to continue improving and changing the code. The code is not going to disappear anywhere soon, so there will be lots of changes.

Why it took us a year to get to this point? Blame is on me. I got bitten by the test bug while writing a really hard real-time firmware app with a former colleague bunch of years back, and we learned that small exact tests leading into small steps of coding lead into zero debugging time. This was type of SW where we earlier had spent majority of our time debugging the code with oscilloscope and manually monitoring led blinks with throw away debugging code. During that experiment I saw the light (as saw my colleague), and thought that this is how also firmware should be written. Write each line of code to make a small test pass. However it is fairly rare in embedded domain to get your hands on a green project. This may not be a characteristic of just embedded sw, but sw in general today. We mostly write enhancements to existing products. Existing products in 2009 are not typically delivered with automated tests, and even less so developed this in mind. There is going to be plenty of opportunities for battles like this. Letting go on the ideal very low level unit testing took a year for me. It is still my ideal way of coding, but we can not get there overnight with legacy code.

If getting first tests in place sounds easy(?), calm down. It is only a starting place. You will notice how hard it is to test your code for example because of scattered initialization routines or that there is no structure in the first place. You should concider all these problems as good things. They are indicators for places of improvement. Those problems are in the code, building tests only make it more visible. If you work on those problems, you should be able to see more dependency breaking opportunities and eventually get to more focused tests. That’s the plan at the moment.

Michael Feathers uses term pinch point in his book about working with legacy code. Pinch point is a function or small collection of functions that you can write tests against and cover changes in many more functions. I guess event handlers for tasks are our first natural pinch points. This at least is the current step on the learning ladder for me. Hope the ladders won’t fall.

James Grenning also made a nice job articulating the whole legacy code testing process in C language (link).

Atomic Object also presented the importance of refactoring the test code from the beginning (link).

Timo's weblog

2/06/2009

Learning to cope with legacy C