Code coverage lies to you



In my previous article I talked about why it is a good idea to write unit tests. Now I want to talk about a different question - how much unit tests to write?

How do we know when to stop writing unit tests? When we run out of ideas for tests? When we test all possible scenarios? When the clock hits 5 o'clock? How do we know that we have written enough tests? One commonly used measure is code coverage.

What is code coverage and how to use it

What is a code coverage? It is usually a number that tells you how much of your code is "covered" by tests. To calculate that number, testing frameworks just count number of lines of code that were executed while running a suite of tests. Then they just give a number, like 60%.

Code coverage is pretty useful metric, it allows engineers to get quantifiable figure and track it over time. It allows to compare apples to apples, we can see how many tests the code has now versus what it had in a past.

I often see development teams setting test coverage goals for their systems. Sometimes, code coverage number is used as part of a build verification process, failing the biuld if code coverage is below a certain threshold. Some say that they need at least 60% of code to be covered by tests, others insist on 80% code coverage, while some even go to extremes demanding 100% code coverage.

What is it telling us?

But let's think of how do we interpret code coverage numbers? If the test suite passes, all tests are green - means our logic is correct, right? If those tests cover 100% of application code, it means our application is 100% correct, right? This is exactly a trap that I have seen people falling into. Following this logic, can we say that an application with 80% code coverage is 50% more correct (or has less bugs) than application with "just" 40% code coverage? Edsger W. Dijkstra argued that "Program testing can be used to show the presence of bugs, but never to show their absence". Following his logic, our suite of tests will show that there are 0 bugs in the logic that we tested. It feels intuitive to extend this logic to say that it means that there are no bugs. But! It shows us that there are 0 bugs, in sutiations that we tested. It does not tell anything about those situsation that we did not test in our suite.

"But wait!", you will tell me, our code coverage shows 100% that means that we tested everything. But testing frameworks are pretty dumb, code coverage just count number of lines that you tested, not all logically possible situations. The will not count if you tested with all possible input data. It is absolutely possible to have 100% coverage, run every line of code, yet don't test all logical situations.

How code coverage lies

Consider this primitive function as example (in no matter what language):

double divide(double a, double b) {
  return a/b;
}

and we have a nice test for it:

void test_divide(){
  assertEquals(2, divide(2/1));
}

Code coverage will be happy to report 100%! But what happens if b is 0? All tests are green and code coverage is 100%, but there are still problems in the code!

Now what?

So if 100% code coverage is not a good goal, then what is? It is a worthy investment of a very expensive engineer's time to write tests to reach a certain code coverage number? Can we say that 40% or 60% is good enough?

Reaching 100% (or any very high percentage) is a very expensive and wasteful venture, just think of all those getters/setters that are present in modern OOP languages and their primitive logic! Even 60% or 40% can make little sense, depending on nature of your system.

I belive that code coverage does not matter. Quality of tests matter much more than mere quantity. If for example, we take Pareto Principle as a guide, we can think that code coverage of just 20% should be able to catch 80% of problems. As long as the tests are good, their quantity is not important.

Be mindful

Overall, code coverage is a useful tool, but it can trick us into false sense of safety, or a goal of reaching high code coverage numbers can lead us to wasting valuable resources. We should not mistaken it for something that it is not. It is a good quantitative measure, not qualitative. In other words, it can show, how big the test suite is, not how good it is.



Tags: testing unit tests code coverage