December 2000
Volume 3 • Number 1
Contents
SOFTWARE QUALITY MANAGEMENT
Applying Quantitative Methods to Software Maintenance
Software maintenance processes can generate large amounts
of data that are useful in evaluating a maintenance operation.
The challenge is often finding ways to make sense of the
data, breaking it down into meaningful indicators of quality,
productivity, and predictability. Data from three years
of software maintenance activities were used in the analysis.
Detailed results for the first year are presented in this
article with references to following years analysis.
The questions that prompted the analysis and the answers
to the questions, as well as the follow-up results of a
major process change, are included.
Key words: corrective maintenance, defective fixes,
quantitative analysis, software process, stable process
by Ed Weller, Bull HN Information Systems, Inc.
INTRODUCTION
A major release to Bull Information Systems GCOS 8
Operating System created the need to understand in detail
the operation of its software maintenance team (SMT), the
group responsible for software maintenance (maintenance in
this discussion refers to corrective maintenance, that is,
fixing defects reported by customers). By looking at the SMT
process data, the company wanted to find what was working
well and duplicate those processes, as well as identify less
effective processes and eliminate them. It also wanted to
measure the typical process metrics to allow year-to-year
comparisons to evaluate the impact of process changes, as
well as to evaluate the performance of the team against perceptions
of its performance. This article concentrates on the metrics
used by the SMT and shows how quantitative methods can be
used to evaluate maintenance work. The evaluations cover three
years of data. Several of the process changes introduced as
a result of the studies are also covered.
DATA CHARACTERISTICS
The GCOS 8 Operating System (exclusive of third-party products)
includes about 7 million lines of source code split across
about 320 product IDs (PIDs). A major release in 1995 resulted
in an expected flow of customer reported defects. As these
defects were repaired, the company monitored a number of the
process metrics to evaluate the effectiveness of the repair
process. The GCOS 8 Operating System is divided into logical
groupings, or PIDs, that might be as large as 400,000 lines
of source code and contain up to 300 modules. In some cases,
several PIDs combine to provide the larger product areas within
the operating system, such as the memory manager, COBOL Compiler,
or I/O Supervisor.
In 1995, 82 percent of the fixes were isolated to 5 percent
of the (320) PIDs, and in 1996, 84 percent of the fixes were
isolated to 5 percent of the PIDs. By concentrating on the
15 PIDs with the highest change volume, the data analysis
effort was manageable. In three cases, PIDs that covered the
same product area were combined, so most of the charts show
only 12 data points. The 1996 data were extended to include
a 13th PID.
Grouping by PIDs appears to be a logical approach, in that
PIDs have a common technical domain or product area, and for
the most part, the people working on problems in each PID
are the same. Problem analysis, resolution, and verification
methods within each group tend to be consistent, so differences
in the metrics from the different PIDs might indicate processes
that should be transferred from one group to another. Since
this grouping may also be sensitive to the relative complexity
of the different product areas, as well as the familiarity
of the maintenance staff with the product areas, some judgment
was required when evaluating these differences.
SMT PROCESS
The SMT was set up as a separate organization to allow a
dedicated team of maintenance specialists to focus on customer
reported problems. Key elements of the process include daily
reviews of open problems, pairing specialists on problems
that have not been resolved within goals (or at the request
of the person working the problem), and critical problem reviews
(CPR) for especially difficult or longstanding problems. Over
several years, the productivity of the SMT, measured by the
effort per system technical action request (STAR), more than
doubled. Metrics are collected using an internally developed
problem tracking system called problem analysis and solution
system (PASS). This system recorded the usual data elements
seen in todays problem tracking systems. Some of the
58 fields include date opened/closed, customer, priority,
assigned to, closure code, and PID. This system also provides
links into the companys configuration management system
to allow the specific code transmittals issued to correct
a problem to be tracked against that problem.
Additional data include a separate database for all fixes
shipped as site-specific corrections (SSCs) and emergency
patches (EPs). In particular, this database is used to track
replacements to EPs and the reason for the replacement (such
as bad fix, regression, fix overlapcurrent fix overlays
a previous one). Also, all fixes are inspected, and the inspection
results are captured in a site database that has inspection
results starting in 1990. The primary linkages between these
databases are the STAR number and PID.
Although the SMT uses a large number of metrics, the following
are of the most interest to Bulls customers and its
management as indicators of quality and predictability.
- Response time by problem priority levelas measured
by entry to and exit from the software maintenance team
(open/closed date in PASS)
- Inspection hit ratio, or the percentage of
inspection meetings of fixes that yield a major defect (a
major defect is one that is visible to the end user). This
is more useful than the usual defects per thousand
lines of code used to measure inspection effectiveness.
- Defective fix ratio or recidivism ratio. This
is a measure of defective fixes shipped to customers, and
does not include defects found via inspection or test, although
the company does have these numbers and uses them to identify
above/below effectiveness in inspection and test as a means
of identifying process improvements.
Quality of the corrective maintenance process
in the eyes of Bulls customers means the defect is fixed
on the first attempt. Response time as measured in the companys
problem tracking system is the end-to-end time in the SMT
for the final fix transmitted to the source control system.
Typically, a work-around, avoidance, or temporary fix is provided
on a site-specific basis as soon as possible. The permanent
fix that is made available across the customer base goes through
a longer development and test cycle, which is the response
time used in this article.
WHAT DID THE COMPANY WANT TO DISCOVER?
Once these metrics are selected as the quality measure of
the work, as seen by Bulls customers, what questions
might one ask about them that will reveal something (useful)
about the SMT process? This article looks at the following:
- Is there a relationship between the volume of STARs and
response time?
- Is there a relationship between the PID in which the defect
is repaired and response time?
- Is there a relationship between the volume of STARs or
PID and quality?
- Does inspection effectiveness affect quality?
- Does response time affect quality?
Answers to these questions should enable the company to answer
a number of process related questions:
- Is the company applying its resources optimally?
- Would a detailed look at the data show obvious areas for
process improvement?
- Does the volume of change or length of response time affect
quality in any way? (There was a perception in some areas
that faster response time was accomplished by rushing
the fixes and tacitly accepting lower quality.)
Introducing inspections in 1990 to 1993 had reduced the recidivism
ratio to one-third of the preinspection values (Weller 1993).
Over the four years the company gained another 33 percent
improvement but had stabilized at a 4 percent ratio. Bull
was looking to better understand the underlying processes
to see what could be changed to improve further.
ANALYSIS TECHNIQUES
The data available for analysis were limited by what was
collected in the problem reporting system, the site inspection
database, and the EP database (emergency patch,
while historically a meaningful definition, now means an object
code fix shipped between releases) kept by the change control
board that manages EPs. The company did not have the luxury
of adding additional data to the problem reporting system
necessary to conduct rigorous experiments. The analysis was
a do what you can do with the available data project.
Scatter
plots, correlation analysis, and detailed comparisons of metrics
across PIDs were used to indicate trends, relationships, and
best-case/worst-case process differences between the extreme
performance groups. The drawing in Figure
1 illustrates this process. A large amount of data are
available for analysis, but the best way to look at the data
is not obvious.
DATA RESULTS AND INTERPRETATION
In the following figures, a STAR is a system technical action
request (problem report) that may be caused by a product defect,
user misunderstanding of product documentation, duplicate
error report, and so on.
In the following charts, response time is the
end-to-end time spent in the SMT for analysis, problem correction,
inspection, verification, and release into the source code
control system. Recidivism is the defective fix
ratio for the problem report. Inspection hit ratio
is the percentage of inspections of fixes that find major
defects.
Figure
2 shows the response time for priority A (highest) STARs
across a group of 12 PIDs. In Figure 2, PIDs A, H, and I have
the largest positive (longer response time) variation. Products
A and H were developed in the same project, and are consistently
on the outer limits in all of the charts in this article.
This project was grandfathered into the release
without being inspected, which caused poor quality product
turnover to the maintenance group. Product I will be discussed
in the next section.
Lessons Learned
- Inspections work (this is a no-brainer, but at the time
of the grandfathering a decision not to inspect completed
work was clearly wrong. There is always time to do it right.)
- The company needed to define exit criteria for turnover
from development to maintenance.
DEVELOPMENT IMPACT ON MAINTENANCE
Figure
3 shows a plot of the recidivism ratio by PID. Again,
PID A is clearly out of the box, with a high response time
and lower quality. The analysis of the problems with this
PID indicate the design was not documented and that it was
not inspected. Without design documentation, the only way
the team could decipher the design was by reverse engineering
the code. The high defect ratio that was visible meant the
rest of the code could not be trusted to reflect the required
design, and the original designer was not available to question.
This is probably the worst-case scenario for software maintenance.
Two PIDs (I and L) have a zero defect ratio. Two items were
noted as possible causes: long familiarity with the product
by the maintainers, and recent source rolls that
cleaned up one of the PIDs. Also, PID I had a
longer response time (see Figure 2), which might indicate
a relationship between response time and quality, which will
be investigated later. Source rolls are a method of integrating
a series of overlapping changes into clean source
code. One of the characteristics of the legacy, proprietary
language used in parts of the operating system is that there
is a tradeoff between frequent source rolls and the cost of
the activity.
Lessons Learned
- Evaluate source roll (source-code cleanup)
frequency to ensure the proper balance between the cost
of the source roll and increased maintenance effort.
- Improve product familiarity (another no-brainer; however,
if the work force is optimally allocated, moving people
around will not work, so product training may be the only
way to achieve this end. Ultimately, this may mean on-the-job
training or experience.)
RESPONSE TIME VS. RECIDIVISM RATIO
The
discussion of the data in Figure 3 suggests response time
might have an impact on quality. This relationship is shown
in Figure 4. There does
not appear to be sufficient correlation to draw any conclusions.
The correlation is low, and the point above 10 percent is
the now infamous PID A.
Lessons Learned
- Response time did not appear to impact the quality of
the work, which was contrary to a view that the faster response
time had been accomplished by rushing the fixes
and accepting lower quality.
Figure
5 plots the inspection hit ratio (percentage of inspections
of fixes that find defects) to volume of fixes to see if higher
change rate affected the inspection results. Although this
set of data has one of the higher correlation values in this
analysis (R = .63), there is an indication that as the volume
of inspections in a product area increases, the team finds
more defects in the rework. The reasons could include:
- More defects caused by a higher volume of work, providing
more opportunities for the inspection teams
- More inspections allow the team to become more proficient
in finding defects
As
a test for the first hypothesis, the company looked at the
relationship of inspection hit ratio to recidivism, shown
in Figure 6.
Readers may have heard that defective products stay
defective through the entire debug cycle. Figure 6 is
an attempt to see if this holds true for fixes. It has the
highest correlation (R = .68) seen so far, but it is still
questionable that product quality can be predicted by inspection
hit ratio. If PID A in the upper right corner is removed,
the relationship (R = .56) is less conclusive.
A Different View
When these data were first presented at the 1996 Applications
of Software Measurement Conference, Shari Lawrence-Pfleeger
asked the author of this article if he had looked at the data
individually, rather than the groupings by PID. He had not,
primarily because the organization of the data and extraction
tools made it relatively easy to view the data by PID, and
extremely time consuming to look at each fix. A recent review
of the data on a fix-by-fix basis revealed a slight trend
indicating the longer it took to fix a problem, the higher
the recidivism ratio.
SUMMARY: YEAR 1
It seems as if nothing is easy in software engineering. It
would have been much easier if one or more of the analyses
had shown a strong correlation, or if there had been a series
of obvious outliers that would have guided corrective action.
One might ask, What did Bull gain by this effort?
The company learned that:
- Quality is maintained in spite of pressures to respond
quickly.
- There is reinforcement that poor process leads to poor
results.
- Further improvements will need to be broad-based process
improvements, rather than fixing exception conditions. The
company did find several indications that suggested a process
changed (source roll frequency).
- This analysis sets a baseline for future comparisons.
- Response time seems to be under control across 12 PIDs,
suggesting that resource allocation is managed fairly well.
THE NEXT YEARS
The
analysis was repeated the next two years. Several of the more
interesting observations are shown in Figure
7. One can see the recidivism ratios for the same set
of PIDs as in the first year with the addition of PID M.
Product A was still a problem, however, it was replaced by
a new PID (M) as the worst behaved. PID M was
the result of a project with poor documentation and no inspections,
creating a repeat nightmare. Unfortunately, the root causes
for PID M problems were well established before the lessons
of the previous year were available. The average recidivism
ratio was up considerably, from 4.1 percent to 5.4 percent,
a 31.7 percent increase. Since the STAR volume for the year
was nearly double (reflecting the increased number of installations
of the new release), this suggests the STAR volume has an
influence on recidivism ratio. If the two worst PIDs are removed
(A and M), however, the difference is 3.9 percent to 4.4 percent
on a year-to-year comparison (12.8 percent). This suggests
that controlling the volume of changes might improve the recidivism
ratio.
LESSONS LEARNED, ACTIONS TAKEN, AND RESULTS
The lessons learned from the second year of data reinforce
the first year of data, that low quality products are difficult
to maintain and will have a high(er) defect ratio in repair
than well-designed products. This is not an earth-shattering
revelation, but sometimes one must have these data to prove
the pay me now or pay me later rule holds for
software just as well as for oil filters and auto engines.
There were several significant changes in the processes (or
enforcement of processes) in the following years.
First, the company realized the existing maintenance processes
were not capable of better performance and instituted a major
change in the testing process. EPs that were distributed beyond
the site-specific correction were subjected to longer testing.
This delayed the availability of the correction, but is in
agreement with the recognized defect to failure ratio relationship
published by Adams (1984). Second, the company reduced the
frequency of EPs. By taking a more conservative approach to
distributing corrections (again consistent with the Adams
work, especially in the second or third year of corrective
maintenance), it avoided taking site-specific corrections
to a second site where they were not needed. This effectively
reduced the recidivism ratio to zero. (Since the middle of
1997, eight or more months of the year see 0 percent recidivism
ratio). The change was so significant that the company was
not able to repeat the analysis in the third year. The number
of defective fixes was so low that it was not possible (or
necessary) to use the same analysis methods.
Second, the company strictly controlled the development processes
on the next release. It introduced defect-depletion tracking
across the full development life cycle. It developed quality
goals for the next release, and predicted and measured conformance
to these goals (Weller 2000). The stated objective of development
was to put the SMT out of business by reducing
the STAR rates to the point where the volume would not sustain
a separate maintenance organization.
|
Cure Worse Than the Disease?
Quantifying a defective fix
ratio from industry publications is an interesting
exercise. Weinberg (1986) suggests fixes have defect
ratios as high as 50 percent for one-line changes, increasing
as size increases to about five lines (75 percent),
and then decreasing to 35 percent as size grows to 20
lines. This article did not say when the numbers were
measured. Are they the injection ratios before any defect
removal activities, or are these numbers the customer-detected
ratios? Capers Jones (1991) suggests the bad-fix injection
ratio should be less than 2.5 percent (but does not
say if this is the injection ratio or customer discovery
ratio, although customer discovery ratio is implied).
In 1994 an article in the IBM Systems Journal
stated the measured corrective maintenance was a
fraction of 1 percent defective fixes, with an
expectation of a 0 percent ratio (Bencher 1994). As
seen in Figure 6, Bulls numbers varied from 0
percent to 36 percent depending on when and where the
measurements were made. Beware of numbers without explanation
or quantification!
|
THE PAYBACK
One
measure of improvement from release to release is to measure
the reported defects against the system-years of operation.
This normalizes for different migration rates and number of
systems. Figure 8 shows
the comparison between the release that generated the data
in this article with its follow-on release. The follow-on
data were subdivided to show the contribution of the new products
in the follow-on release. The follow-on release was as stable
as its mature predecessor 12 months after initial exposure
because the defect ratio of the new product was significantly
better than its predecessor. The current ratio is about 40
times better, but actual exposure of the post-Y2K release
is still limited. The company expects the final result to
be in the 10-to-20-to-1 range. Note that all of these gains
were not due to the results of this study, but several key
changes, including the absolute need for design and inspection,
were rigorously enforced on all products going into the follow-on
release.
SUMMARY
What did Bull Information Systems learn? Obviously, the first
lesson is do it right the first time. Looking
strictly at the analysis of the SMT process, the company concluded
the existing defect removal processes were delivering the
best the company could expect, and that any improvements would
require major changes to the process.
Are these results transferable to another company? To investigate
the approach used to do this analysis, readers can try the
following with their own data:
- Find ways to divide data into repeating or related groups,
whether time based, product identity, or other division.
- Consider scatter plots and values by group (for example,
see Figure 1).
- Look at the outliers (best case/worst case).
If there are outliers, see if one can reconstruct their history.
Look at both the good and bad to propagate the good processes
as well as avoiding the bad (or nonexistent) processes.
A few warnings might include:
- Do not be fooled by coincidental correlations. Just because
the data look the way one expects or wants them to, do not
accept it.
- If a company has to maintain its products, be sure to
document the designs, and inspect the designs and code.
If responsible for product maintenance, refuse to accept
a product that fails to meet these criteria, or allow sufficient
budget to handle a high rate of problems (and repeat problems).
- Look for distortions caused by a large subset of data
from one project.
- Do not ignore outliers, but do look at the data with
outliers removed.
- Ensure data sets have a large enough population.
- Consider product ageare older products better or
worse than newer products?
- Consider the people who are doing the worktraining,
familiarity with product, and so on, but never blame
the people for the problems.
- Apply common sense!
ACKNOWLEDGMENTS
Without the detailed EP data collected by Jim Hensley, this
study would not have been possible. I would also like to thank
the many valuable comments from the reviewers.
Much of this work was presented in 1996 and 1997 at the Application
of Software Measurement Conferences. The material was presented
here with the benefit of four to five years of hindsight,
and the results of some of the improvements made since then.
REFERENCES
Adams, E. 1984. Optimizing preventive service of software
products. IBM Journal of Research and Development 28,
no. 1.
Bencher, D. L. 1994. Programming quality improvement in IBM.
IBM Systems Journal 33, no. 1: 218-219.
Jones, C. 1991. Applied software measurement. New
York: McGraw-Hill.
Weinberg, G. 1986. Kill that code. IEEE Tutorial on Software
Restructuring. Washington, D. C.: IEEE Computer Society
Press.
Weller, E. F. 1993. Lessons from three years of inspection
data. IEEE Software (September): 38-45.
. 1995. Applying statistical process control
to software maintenance. In Proceedings of the Applications
of Software Measurement Conference. Orange Park, Fla.:
Software Quality Engineering.
. 1996. Managing software maintenance with
metrics. In Proceedings of the Applications of Software
Measurement Conference. Orange Park, Fla.: Software Quality
Engineering.
. 2000. Practical applications of statistical
process control. IEEE Software (May/June): 48-55.
BIOGRAPHY
Ed Weller is a Fellow at Bull HN Information Systems
in Phoenix, Ariz., where he is responsible for the software
processes used for their mainframe operating systems group.
Prior to joining Bull HN, he was a technical staff engineer
and manager of the Systems and Software Engineering Process
Group at Motorolas Satellite Communications Division.
He is an authorized lead assessor in the Software Engineering
Institutes Appraiser Program for CMM-based Appraisals
for Internal Process Improvement. Weller received the IEEE
Software Best Article of the Year award for his
September 1993 article, Lessons From Three Years of
Inspection Data, and was awarded Best Track Presentation
at the 1994 Applications of Software Measurement Conference
for Using Metrics to Manage Software Projects.
Weller has more than 30 years of experience in hardware,
test, software, and systems engineering of large-scale hardware
and software projects and is a Senior member of IEEE. He has
a bachelors degree in electrical engineering from the
University of Michigan and a masters degree from the
Florida Institute of Technology. Weller can be reached at
Bull Information Systems, Inc., 13430 N. Black Canyon, MS
Z-68, Phoenix, AZ 85029 or by e-mail at Ed.Weller@Bull.com.
If you liked this article, subscribe
now.