Timo's weblog: 2009

12/19/2009

Bowling Game Kata in C

Olve Maudal has shared his presentation of Bowling Game Kata using C. You can find the slides directly here (pdf).

I'll add a link to my TDD in C delicious.

6/30/2009

Coverage with lcov, and so what?

A while back we ran an experimental line coverage analysis on our acceptance test suite. The result was 68% on the code for the main control board. I got the result from nightly build and mentioned it in Daily Scrum, and prompted "so what do we think about it, should we track it"? Everyone on the team had a blank stare and then finally a team member came forward "yeah, that's a good question. So, what?"

Coverage is information. It is just that, an additional piece of information, not by any means the final truth. I don't remember who teached me this, but;

"If you take all your assert's away you still have the same coverage. You just ain't testing anything at all."

This has been explained here and of course in Brian Marick's classic How to Misuse Code Coverage (pdf).

Well, maybe the good coverage can not say anything about the quality of your tests, but poor coverage can certainly say thing or two of opposite nature. If your coverage is 20% we can say quite confidently that you ain't there yet.

I started with acceptance test line coverage, but the rest is about unit test line coverage. Some embedded teams use gcov and I have heard people fiddling the data to generate fancier reports. Being lazy as I am I didn't do it myself. I did what I'm good at and searched for what others have already done. I found lgov, which is a tool in Perl to format gcov data.

We run lcov under Cygwin. You can get lcov for example from here, extract, and execute "make install". Next compile and link unit tests with gcc using flags "-fprofile-arcs" and "-ftest-coverage". We have a special build target for intrumenting the unit test executables with debug and coverage information so that we don't unnecessary slow down the bulk of builds. Next execute your full suite just like you normally would.

In our case all .obj files from test build are in ./bin directory, and that's where all the coverage data files go to. Our unit test script moves them to ./bin/code_coverage directory away from .obj files, and we want the final html report to be in ./build/test/code_coverage. Now we have the information necessary to create a shell script to do the actual analysis and reporting of coverage data:


path=`dirname $0` 
lcov --directory $path/bin/code_coverage -b $path --capture --output-file $path/ic.info
genhtml -o $path/build/test/code_coverage $path/ic.info

Vola', your disappointing(?) results are ready to be browsed, like so:

What the heck, it's all green, while you only have tests for simple utils files? In this approach there is a limitation - you only get coverage information for the files that are involved with your test suite. With huge legacy code, this would yield too promising picture early on. Again you need to think for yourself.

Experiment with coverage in your team, I think it's worth every penny but even when you start closing 100%, remember to keep analyzing, "so what?"

2/06/2009

Learning to cope with legacy C

New responsibilities during the past year have been a great learning experience. The key learning is that now I really know how incompetent I am. I can’t wait to move again and learn how many more things I do really badly, or what would be even better, can't do at all. This is a brief story of one such finding during this joyrney.

For the past year we have focused on ATDD with our own framework written in Python. We have 200+ automated acceptance tests for the system. With unit tests we however have struggled. While we have over 100 (well, it’s a start) of them, without the exception of the latest ones they are not really meaningful.

What's different with the latest tests then? They focus on higher level. I’m not sure what these tests as programmer tests should be called, but a programmer test will do for now. I do believe unit tests should be focused when doing TDD, but, wait, wait, I have an excuse… The code is old. It has its dependencies, and while maybe not the worst case in the world, it is a pain to get something compiled in isolation. The code has responsibility based structure (or should have had), and this structure is expressed in the source code folder structure. Each of the responsible "modules", or folders, typically contain own task. A typical task looks something like this:

task_specific_inits();

for(;;) {
s = OS_wait_for_something();
switch(s) {
case 1:
do_something1(s);
break;
}
}

Sometimes do_something1(s) is inlined and you may get a bitter sweet taste of those infamous 1000+ line functions. Other times you are lucky and the whole high level event parsing is already done in own function, along with lines do_something_with_the_event_from_X(s). This function continues the handling with loooong switch case, hopefully just calling further functions.

So, when we decide to test something inside a selected "module", or a folder in our case, we compile and link single test file, all the production code from a single responsible module/folder, production code for everything considered utils, like linked lists etc., and fake everything else. For faking we use Atomic Object's Cmock and manually written stuff when appropriate. We choose the task handling for injecting the test actions.

We arrange the test execution environment as we wish by initializing all the parties to expected state and teaching the mocked neighbours accordingly. We inject a single event, or short sequence of events, into task's handling routine and we try to find ways to assert if everything went as we wished for. Sometimes we can use this to learn what really happens when you give such and such event. After all the default assumption is that the code works, as it has been in production for years. We want to make sure it stays that way, when we change it. We have several options for observing the behavior:

1. Automatically generated mocks will tell us if the interaction was as expected
2. We can use getters of utilities, like linked lists
3. We can sense the internal status of any of the production code files with few nasty little tricks like #define STATIC

When the first test, and maybe her friend, is running it is time to start refactoring your code. Refactoring your test code, that is. If you take a closer look on what you have done, you most likely see 1-2 300 lines long test cases, which look pretty much the same. Now it is a good time to start extracting helpers. When creating an event sequence to be run you probably generate similar data structures. These can be extracted into functions. You probably do a bunch of similar assertions on many of your test. These can be extracted to helper functions. And so on, and so on. Each refactoring is likely to reveal more opportunities for cleaning the code. This can't be emphasized more. It is important to keep the code clean from the beginning. Otherwise you will have a 10KLOC test file on your hands, and it is much more work to start cleaning it only at that point.

This is very far from TFD (test first design). It is a battle to get some tests going to be in better place to continue improving and changing the code. The code is not going to disappear anywhere soon, so there will be lots of changes.

Why it took us a year to get to this point? Blame is on me. I got bitten by the test bug while writing a really hard real-time firmware app with a former colleague bunch of years back, and we learned that small exact tests leading into small steps of coding lead into zero debugging time. This was type of SW where we earlier had spent majority of our time debugging the code with oscilloscope and manually monitoring led blinks with throw away debugging code. During that experiment I saw the light (as saw my colleague), and thought that this is how also firmware should be written. Write each line of code to make a small test pass. However it is fairly rare in embedded domain to get your hands on a green project. This may not be a characteristic of just embedded sw, but sw in general today. We mostly write enhancements to existing products. Existing products in 2009 are not typically delivered with automated tests, and even less so developed this in mind. There is going to be plenty of opportunities for battles like this. Letting go on the ideal very low level unit testing took a year for me. It is still my ideal way of coding, but we can not get there overnight with legacy code.

If getting first tests in place sounds easy(?), calm down. It is only a starting place. You will notice how hard it is to test your code for example because of scattered initialization routines or that there is no structure in the first place. You should concider all these problems as good things. They are indicators for places of improvement. Those problems are in the code, building tests only make it more visible. If you work on those problems, you should be able to see more dependency breaking opportunities and eventually get to more focused tests. That’s the plan at the moment.

Michael Feathers uses term pinch point in his book about working with legacy code. Pinch point is a function or small collection of functions that you can write tests against and cover changes in many more functions. I guess event handlers for tasks are our first natural pinch points. This at least is the current step on the learning ladder for me. Hope the ladders won’t fall.

James Grenning also made a nice job articulating the whole legacy code testing process in C language (link).

Atomic Object also presented the importance of refactoring the test code from the beginning (link).

1/27/2009

My Delicious Embedded TDD Links

After a hint from a friend I recalled that I actually did start to collect embedded TDD links on delicious.com a while ago.

So not having anything more interesting to do, I updated the collection a bit and placed a link on the sidebar.

My Delicious Links on Embedded TDD