Tuesday, November 9, 2010

Static Analysis Metrics - A Window into Development

You can't manage what you can't measure right?  In addition to detecting problems in source code, static analysis can be used to produce metrics that provide insight into productivity, code quality, security and more.  We all know the dangers of managing solely by metrics, yet, metrics serve as a highly useful window into your software development process.  In this post, we most certainly won't focus on the general challenges of using metrics because they've been written about in many other forums.  We will discuss some static analysis metrics here that when pieced with other information can provide useful information.

Static analysis is typically integrated at the system build level.  These days, most organizations build their code on a frequent basis - which are convenient points of time to generate static analysis results.  If you are running a nightly build, you can get static analysis output on a daily basis.  If you run it on a continuous integration basis you get output at the frequency you choose (sometimes static analysis takes longer than a continuous integration cycle if you haven't tuned it but then simply perform static analysis on every 2nd or 3rd cycle).  Metrics can be used to get a sense for where you are (status) and point to potential problem areas too (alerts).  Thresholds can be set to spur action - such as block a release, or trigger a code review or design review.

Every organization has different priorities and thus every organization should have a different set of metrics that they should track to maintain alignment with goals.  We list just a few common metrics that can come from static analysis tools.

Management Level Metrics

Most static analysis tools can easily generate output such as:
  • Lines of code, number of new lines of code, churn
  • Number of defects outstanding, number of new defects introduced
  • Defects per thousand line of code (defect density)
  • Number of defects fixed so far, number of defects fixed in last period
  • Comments per line of code 
  • Cyclomatic complexity, function points
Supporting Metrics to Ensure Compliance
  • Code coverage for tests (100% or near 100% should be attained)
  • Average time to resolve issues
  • Number of defects marked as false positives (or low priority)
A couple of considerations about this data:

Breakdowns
The entire codebase should be measured and tracked however convenient breakdowns can be created to make the information more actionable.  For instance, data could be collected on a team by team basis (or component by component).  Individuals metrics can also be calculcated in order to hone in on specific problem areas or better yet, to highlight improvement areas worth rewarding.  In addition, team leads may be interested in tracking information on a function by function basis.

All defect related metrics should be broken down by priorities and/or severities.  Metrics that lump in low priorities become diluted.  This is particularly important if you have an acceptance criteria that requires all critical and high defects to be fixed before release.

What Is Measured
Because static analysis usually operates at the build level, you can get meaningful statistics on a per codebase perspective.  Thus if you have multiple targets (e.g. different operating systems, devices, etc.) you can produce statistics for each. You can find build target specific bugs as well as bugs in shared code.

Absolute Versus Relative
A number without context provides almost no value.  Let's say the tool spits out a defect density rate of 1 bug per 1000 lines of code.  What can you do with the information?  The data becomes rich information when you are able to compare it to other numbers.  The most common way to use these metrics is to benchmark against industry standards.  Coverity has an interesting database of open source defects from their Department of Homeland Security grant.  While comparing your codebase with open source statistics may not be ideal (wouldn't it be great if you could benchmark against your competitors), it provides at least some real world data that you can use to compare and contrast.  Are you above or below the average and by how much?

Another common way to interpret the numbers is to evaluate them over time.  Take a baseline - a snapshot in time and then measure against that baseline to see how you are trending.  A convenient time to take a baseline is at the beginning of a branch pull.  Some organizations have the goal of toeing the line - meaning "no new harm."  Quite simply the goal is to maintain the same level of quality as prior to any new development efforts.  Others at least want to see improvement over time - showing that they are making progress.  The high level defect density as well as the overall defect number are very typical KPI's (key performance indicators) chosen by development organizations.

Frequency
Note again that the frequency is as often as you run static analysis.  As part of a system build, this is typically run regularly, either every evening, every few days or every continuous integration cycle.  Advanced organizations also provide static analysis at the developer level so that they can run on-demand static analysis.  Klocwork has some strong capabilities in this area.  In this way developers can proactively identify and fix potential problems prior to check-in.  Giving the power to the developers to view and improve the metrics as they work, results in better system-level metrics faster and much less chance of having to deal with an emergency later in the development cycle.

Trending
All data should be stored and accessible to get a historical context.  Graphical charts enable a quick interpretation of results.  Historical context helps you see how you are doing relatively.  Start with a baseline and see how you are doing from that point onward.

Data Consistency
Static analysis tools are complex, with many settings that are tweakable in order for it to understand the myriad of different codebases that it can analyze.  Think what kind of bugs would be useful to find in a 10,000 line of code embedded device (like a stack overflow) versus a 10 million line of code server application (like a concurrency bug).  If not set up properly, the static analysis tool can create challenges in keeping the data stable so that you get consistent numbers to compare over time.  For instance, if you upgrade the static analysis tool, new and improved checkers may suddenly change the defect count.  New checkers will find new bugs, which may falsely signal that there was a sudden increase in the number of defects in your codebase.  Those bugs were always there :-). Similarly, an improved checker may suddenly report many fewer false positives but maybe at the cost of a few real bugs that are no longer found.

If analyses can be run in multiple places (for instance, developers running the analysis tool locally on their machine) then discrepancies can occur if the configurations and the code being analyzed are not identical.  Even changing some settings that you wouldn't imagine could possibly cause a data consistency problem can mysteriously change the numbers.  While static analysis tools are designed to find bugs, that doesn't mean they don't have bugs themselves.  These tools are doing a lot of complex analyses behind the scenes and don't always provide consistent results in every possible scenario.

In a future blog post, we'll look at some sample dashboards so that you get a sense for what can be used for reporting and how these data points can be used.

No comments:

Post a Comment