Monday, July 13, 2009

Usage Models for Static Analysis

Static analysis (or source code analysis) hasn't yet hit the mainstream yet. Leading organizations are using it to gain a quality and security advantage over their competitors, but it's not yet part of the standard tool chain. With regards to tooling, nearly every software organization starts with a source code repository, bug tracking database, compiler, automated test frameworks, etc. They standardize on these functions because a software development team needs to collaborate effectively to produce a final product.

However, there are other parts of the tool chain which are often (but not always) left to individual preferences, such as editors like emacs, vi, Eclipse. Other tools that often fit in this category can be architectural tools, debugging tools and static analysis tools, such as Lint. Rarely is it a requirement for every developer to look at all Lint issues. Lint is more of an option, left to the individual developer's decision on if and when to use it. Recent static analysis vendors like Coverity and Klocwork and Fortify Software are providing to the market place a more holistic solution, meant for every developer in an organization. The market has evolved quickly in recent years with more team and organization oriented tools.

The deployment model needed for a given organization depends heavily on what the goal is they want to accomplish. Success and failure can take many different forms when using static analysis. Some organizations are perfectly happy finding the occasional killer bug and leaving the rest of the reported defects unaddressed. Others, want to use the tool as essentially a cornerstone to their code review efforts and to find every possible problem that could exist. All of these call for different analysis options and for different deployment models.

Process Structure of Code Analysis
This is a simplification, but for most static analysis deployments the following roles need to be addressed:
  • Administrator - who installs and manages the tool
  • Triager - who reviews the reports that come from the tool and prioritizes the results
  • Fixer - who fixes the problems
  • Verifier - who verifies the fix worked and didn't introduce new problems
These roles can be filled by internal and external staff in various combinations. The most common deployment models we see are the following:

"The static analysis guy" -- this is where one developer has been dubbed as "the static analysis guy" and performs all of the above functions. This person runs the tool on their machine periodically usually for a couple of minutes every day or at a milestone during the latter part of a release cycle. This developer often reviews the defects and either fix the bugs or distribute them to developers. This option is great for companies who care only about the "low hanging fruit" bugs. This options has several advantages:

+ one person who is responsible for everything and therefore "one neck to wring"
+ few handoffs

But...

- Process is inherently unscalable
- No checks and balances in place to make sure the person responsible hasn't made any mistakes or is using the tool suboptimally
- Putting someone in charge of static analysis takes someone "off the bench". Usually a senior developer is put in charge of static analysis causing s/he to invest considerable time in becoming a tools expert than focusing on building features for their product.

Regrettably, some organizations who have every intention to roll static analysis out to their organization never get past this point.

"Special teams" - this deployment option takes advantage of different specialties. The administration side is taken care of by the tools group. The triaging is typically handled by a separate team such as, a security review team or a quality review team who examine the problems up front and then distribute them to the developers who handle the fixing. The verification process is usually handled by the security or quality team to ensure the fixes are done right and that no new issues are introduced. The advantage of this model include:

+ Great checks and balances in play that help the overall quality improve. Healthy tensions between triagers, fixers and verifiers drive quality.
+ Saves significant time from your most precious resources - your developers. The largest cost for many organizations using static analysis is in reviewing the defects (see earlier post on the hidden cost of static analysis).
+ Highly scalable structure

But of course it is not without its costs:
- Handoffs of defects requires more coordination and can create errors
- Development skills are required in every role. A tools engineer must be able to understand code and be able to tune the analysis based on developer feedback. Security reviewers, quality engineers, etc. must also have software development skills.

"Managed Service" -- this option focuses the software development team on their core competences . When people buy static analysis, they are buying essentially the defects that they end up fixing. They aren't buying the administration burden, they aren't buying false positive reports that come from the tool -- they are buying the defects that are reported which they take action upon. In this model, the administration is managed by an outside consultant. The triaging is handled by a specialist security or quality review team. The fixing is handled by an off-shore software development team - the same type of team that regularly handle maintenance for software products. The verification is handled by either a skilled QA team or the review team. There are several key advantages and disadvantages:

+ Consultants are specialists in their field - so you get the best practices implementation without false starts
+ Consultants are responsible for hand delivering only the defects you want to fix so that you don't have to "lift a finger" in managing the tool or triaging the results. This enables the team to focus on what they do best - building features. This is one of the main reasons why managed services in general have taken off.
+ Resources can be scaled up and down as needed, for instance you can have a larger team handle the triage of a big backlog and then once the backlog is down to zero, have a smaller team deal with the bugs caused by incremental changes. For most organizations, administration is only a part-time job.
+ Consulting organizations take all of the brunt of scheduling. You don't have to worry about sick/vacation days or deal with HR issues.

However,

- Working with consultants requires some handoff both in startup and ongoing work.
- Consultants also require some overhead to manage.
- In particular for fixing, consultants can't know the code as well as your developers. Some collaboration is required in order to make the consultants efficient and effective, just as in the case when an offshore team takes responsibility for a codebase.

"Every developer is responsible for quality" In this model, each and every developer owns quality and security. In this case, the software developers are responsible for reviewing the results and fixing them. Verification can also be done by the developers depending upon how much you tolerate the "fox guarding the henhouse" approach. The advantages:

+ Puts more of the responsibility for quality into the hands of the developers
+ Brings defects to the developer's at the absolute earliest time - right as they are coding versus waiting for results from a nightly build.
+ Helps developers check in better code, a benefit for Agile development and continuous integration models.

But,

- Requires a consistent approach. There is a wide variance between the best and the worst developers in a given software development group. A common saying in the static analysis community is that the developers who ignore or rebel against static analysis tend to be the ones who need it the most.
- Lack of checks and balances. Over time, some developers will just write code that makes the tool "shut up" even at the cost of making less stable code. If the improper incentives are put in place then this kind of suboptimal behavior can result.
- While the total time to review bugs every day may be small, added up across every developer every day can result in a significant amount of time. For most organizations, the time for triage significantly surpasses the time it actually takes to fix bugs.
- Training is required for every developer as well as every future developer who comes into the organization.

"Hybrid approach" Nearly every other combination exists in some way shape or form. Many organizations opt for a hybrid of the above, most typically a tools group responsible for the administration of the tool (in-house or outsourced), a review team (in-house or outsourced engineers) and an in-house development team to fix the reported defects. In this way, developers are focused on what they do best, building features, not on becoming experts at running a static analysis tool. In addition, having a separate triage team saves time, freeing your developers from having to wade through "don't care"'s, false positives and low priority items.

Have it Your Way
Using a good static analysis tool can significantly improve the quality and security of code. Whether the goal is to address compliance, find every last possible memory leak, or to occasionally find some bugs, there is a deployment model that should work. Each method has its pluses and minuses. Over time needs will change, the deployment model should change as well. We don't doubt that there are quite a few other potential configurations out there to get the most out of the static analysis tools.

No comments:

Post a Comment