Saturday, March 20, 2010

The False False Positive

Static analysis tools report potential bugs in source code by analyzing the structure of the code for inconsistencies and flaws. Sometimes they get it right and sometimes they get it wrong depending upon how strong the analysis is and how complex the code may be. These days, static analysis tools are getting more and more sophisticated, doing statistical analysis and interprocedural analysis (where information is tracked across functions) across all the logical paths (sometime numbering many millions) in the code. Dataflow analysis can track values across functions to produce bug reports that span multiple levels of calls.

While increased sophistication means that static analysis tools can catch more problems with a higher degree of accuracy, the burden increases on the reviewer of the results to interpret them correctly. If you were grep'ing through some code for something you can quickly review (and dismiss) many of the results because you understand what your "analysis" is doing. With static source code analysis, this is much less apparent.

We see many engineers look at a complex bug report and not take the necessary time to understand the problem and fix it. This is mostly because they don't understand what the static analysis tool is doing and how deep it is analyzing the code. The result is a real bug being marked as a false positive - or a "false false positive" if you will. These bugs then disappear off the queue never to be seen again - a lost opportunity.

How do you minimize the number of false false positives? Lots of different ways:
  • Training and/or mentoring of the reviewer team helps the team see how to properly disposition a defect
  • Auditing of false positives either periodically or all-the-time. Incorrect markings are great learning opportunities to correct incorrect usage.
  • Metrics that monitor the general false positive rate so that individuals can be benchmarked against them to detect anomalies. If one engineer is marking 80% of the defects as false positives versus a general 30% average, then this is likely a correctable problem.
  • Have double reviews of defects. Similar to pair programming make sure each disposition has "two pairs of eyes" on it
By making triaging more accurate you get much more value from your tool and minimize the incorrect throwaway.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete