American Society for Quality
Members Log In to My ASQ Members Log In   View Shopping Cart Shopping Cart   Quality Progress Magazine Quality Progress Magazine Make Good Great
Magazines & Journals
Software Quality Professional

Printer Friendly
Issues
I Want To
Article Access Key
  • Public Article
  • Log-In to View
  • Regular, Senior, or Fellow members with no subscription.
  • Regular, Associate, Forum/Division, Senior or Fellow members who are also subscribers.
  • Organizational and Sustaining Members have access to all issues.

December 2000
Volume 3 • Number 1

Contents

SOFTWARE QUALITY MANAGEMENT
Applying Quantitative Methods to Software Maintenance

Software maintenance processes can generate large amounts of data that are useful in evaluating a maintenance operation. The challenge is often finding ways to make sense of the data, breaking it down into meaningful indicators of quality, productivity, and predictability. Data from three years of software maintenance activities were used in the analysis. Detailed results for the first year are presented in this article with references to following years’ analysis. The questions that prompted the analysis and the answers to the questions, as well as the follow-up results of a major process change, are included.

Key words: corrective maintenance, defective fixes, quantitative analysis, software process, stable process

by Ed Weller, Bull HN Information Systems, Inc.

INTRODUCTION

A major release to Bull Information Systems’ GCOS 8 Operating System created the need to understand in detail the operation of its software maintenance team (SMT), the group responsible for software maintenance (maintenance in this discussion refers to corrective maintenance, that is, fixing defects reported by customers). By looking at the SMT process data, the company wanted to find what was working well and duplicate those processes, as well as identify less effective processes and eliminate them. It also wanted to measure the typical process metrics to allow year-to-year comparisons to evaluate the impact of process changes, as well as to evaluate the performance of the team against perceptions of its performance. This article concentrates on the metrics used by the SMT and shows how quantitative methods can be used to evaluate maintenance work. The evaluations cover three years of data. Several of the process changes introduced as a result of the studies are also covered.

DATA CHARACTERISTICS

The GCOS 8 Operating System (exclusive of third-party products) includes about 7 million lines of source code split across about 320 product IDs (PIDs). A major release in 1995 resulted in an expected flow of customer reported defects. As these defects were repaired, the company monitored a number of the process metrics to evaluate the effectiveness of the repair process. The GCOS 8 Operating System is divided into logical groupings, or PIDs, that might be as large as 400,000 lines of source code and contain up to 300 modules. In some cases, several PIDs combine to provide the larger product areas within the operating system, such as the memory manager, COBOL Compiler, or I/O Supervisor.

In 1995, 82 percent of the fixes were isolated to 5 percent of the (320) PIDs, and in 1996, 84 percent of the fixes were isolated to 5 percent of the PIDs. By concentrating on the 15 PIDs with the highest change volume, the data analysis effort was manageable. In three cases, PIDs that covered the same product area were combined, so most of the charts show only 12 data points. The 1996 data were extended to include a 13th PID.

Grouping by PIDs appears to be a logical approach, in that PIDs have a common technical domain or product area, and for the most part, the people working on problems in each PID are the same. Problem analysis, resolution, and verification methods within each group tend to be consistent, so differences in the metrics from the different PIDs might indicate processes that should be transferred from one group to another. Since this grouping may also be sensitive to the relative complexity of the different product areas, as well as the familiarity of the maintenance staff with the product areas, some judgment was required when evaluating these differences.

SMT PROCESS

The SMT was set up as a separate organization to allow a dedicated team of maintenance specialists to focus on customer reported problems. Key elements of the process include daily reviews of open problems, pairing specialists on problems that have not been resolved within goals (or at the request of the person working the problem), and critical problem reviews (CPR) for especially difficult or longstanding problems. Over several years, the productivity of the SMT, measured by the effort per system technical action request (STAR), more than doubled. Metrics are collected using an internally developed problem tracking system called problem analysis and solution system (PASS). This system recorded the usual data elements seen in today’s problem tracking systems. Some of the 58 fields include date opened/closed, customer, priority, assigned to, closure code, and PID. This system also provides links into the company’s configuration management system to allow the specific code transmittals issued to correct a problem to be tracked against that problem.

Additional data include a separate database for all fixes shipped as site-specific corrections (SSCs) and emergency patches (EPs). In particular, this database is used to track replacements to EPs and the reason for the replacement (such as bad fix, regression, fix overlap–current fix overlays a previous one). Also, all fixes are inspected, and the inspection results are captured in a site database that has inspection results starting in 1990. The primary linkages between these databases are the STAR number and PID.

Although the SMT uses a large number of metrics, the following are of the most interest to Bull’s customers and its management as indicators of quality and predictability.

  • Response time by problem priority level–as measured by entry to and exit from the software maintenance team (open/closed date in PASS)
  • Inspection “hit ratio,” or the percentage of inspection meetings of fixes that yield a major defect (a major defect is one that is visible to the end user). This is more useful than the usual “defects per thousand lines of code” used to measure inspection effectiveness.
  • Defective fix ratio or “recidivism ratio.” This is a measure of defective fixes shipped to customers, and does not include defects found via inspection or test, although the company does have these numbers and uses them to identify above/below effectiveness in inspection and test as a means of identifying process improvements.

“Quality” of the corrective maintenance process in the eyes of Bull’s customers means the defect is fixed on the first attempt. Response time as measured in the company’s problem tracking system is the end-to-end time in the SMT for the final fix transmitted to the source control system. Typically, a work-around, avoidance, or temporary fix is provided on a site-specific basis as soon as possible. The permanent fix that is made available across the customer base goes through a longer development and test cycle, which is the “response time” used in this article.

WHAT DID THE COMPANY WANT TO DISCOVER?

Once these metrics are selected as the quality measure of the work, as seen by Bull’s customers, what questions might one ask about them that will reveal something (useful) about the SMT process? This article looks at the following:

  • Is there a relationship between the volume of STARs and response time?
  • Is there a relationship between the PID in which the defect is repaired and response time?
  • Is there a relationship between the volume of STARs or PID and quality?
  • Does inspection effectiveness affect quality?
  • Does response time affect quality?

Answers to these questions should enable the company to answer a number of process related questions:

  • Is the company applying its resources optimally?
  • Would a detailed look at the data show obvious areas for process improvement?
  • Does the volume of change or length of response time affect quality in any way? (There was a perception in some areas that faster response time was accomplished by “rushing the fixes” and tacitly accepting lower quality.)

Introducing inspections in 1990 to 1993 had reduced the recidivism ratio to one-third of the preinspection values (Weller 1993). Over the four years the company gained another 33 percent improvement but had stabilized at a 4 percent ratio. Bull was looking to better understand the underlying processes to see what could be changed to improve further.

ANALYSIS TECHNIQUES

The data available for analysis were limited by what was collected in the problem reporting system, the site inspection database, and the EP database (“emergency patch,” while historically a meaningful definition, now means an object code fix shipped between releases) kept by the change control board that manages EPs. The company did not have the luxury of adding additional data to the problem reporting system necessary to conduct rigorous experiments. The analysis was a “do what you can do with the available data” project.

figure 1Scatter plots, correlation analysis, and detailed comparisons of metrics across PIDs were used to indicate trends, relationships, and best-case/worst-case process differences between the extreme performance groups. The drawing in Figure 1 illustrates this process. A large amount of data are available for analysis, but the best way to look at the data is not obvious.

DATA RESULTS AND INTERPRETATION

In the following figures, a STAR is a system technical action request (problem report) that may be caused by a product defect, user misunderstanding of product documentation, duplicate error report, and so on.

In the following charts, “response time” is the end-to-end time spent in the SMT for analysis, problem correction, inspection, verification, and release into the source code control system. “Recidivism” is the defective fix ratio for the problem report. “Inspection hit ratio” is the percentage of inspections of fixes that find major defects.

figure 2Figure 2 shows the response time for priority A (highest) STARs across a group of 12 PIDs. In Figure 2, PIDs A, H, and I have the largest positive (longer response time) variation. Products A and H were developed in the same project, and are consistently on the outer limits in all of the charts in this article. This project was “grandfathered” into the release without being inspected, which caused poor quality product turnover to the maintenance group. Product I will be discussed in the next section.

Lessons Learned

  • Inspections work (this is a no-brainer, but at the time of the grandfathering a decision not to inspect completed work was clearly wrong. There is always time to do it right.)
  • The company needed to define exit criteria for turnover from development to maintenance.

DEVELOPMENT IMPACT ON MAINTENANCE

figure 3Figure 3 shows a plot of the recidivism ratio by PID. Again, PID A is clearly out of the box, with a high response time and lower quality. The analysis of the problems with this PID indicate the design was not documented and that it was not inspected. Without design documentation, the only way the team could decipher the design was by reverse engineering the code. The high defect ratio that was visible meant the rest of the code could not be trusted to reflect the required design, and the original designer was not available to question. This is probably the worst-case scenario for software maintenance.

Two PIDs (I and L) have a zero defect ratio. Two items were noted as possible causes: long familiarity with the product by the maintainers, and recent “source rolls” that “cleaned up” one of the PIDs. Also, PID I had a longer response time (see Figure 2), which might indicate a relationship between response time and quality, which will be investigated later. Source rolls are a method of integrating a series of overlapping changes into “clean” source code. One of the characteristics of the legacy, proprietary language used in parts of the operating system is that there is a tradeoff between frequent source rolls and the cost of the activity.

Lessons Learned

  • Evaluate “source roll” (source-code cleanup) frequency to ensure the proper balance between the cost of the source roll and increased maintenance effort.
  • Improve product familiarity (another no-brainer; however, if the work force is optimally allocated, moving people around will not work, so product training may be the only way to achieve this end. Ultimately, this may mean on-the-job training or experience.)

RESPONSE TIME VS. RECIDIVISM RATIO

figure 4The discussion of the data in Figure 3 suggests response time might have an impact on quality. This relationship is shown in Figure 4. There does not appear to be sufficient correlation to draw any conclusions. The correlation is low, and the point above 10 percent is the now infamous PID A.

Lessons Learned

  • Response time did not appear to impact the quality of the work, which was contrary to a view that the faster response time had been accomplished by “rushing” the fixes and accepting lower quality.

figure 5Figure 5 plots the inspection hit ratio (percentage of inspections of fixes that find defects) to volume of fixes to see if higher change rate affected the inspection results. Although this set of data has one of the higher correlation values in this analysis (R = .63), there is an indication that as the volume of inspections in a product area increases, the team finds more defects in the rework. The reasons could include:

  • More defects caused by a higher volume of work, providing more opportunities for the inspection teams
  • More inspections allow the team to become more proficient in finding defects

figure 6As a test for the first hypothesis, the company looked at the relationship of inspection hit ratio to recidivism, shown in Figure 6.

Readers may have heard that “defective products stay defective through the entire debug cycle.” Figure 6 is an attempt to see if this holds true for fixes. It has the highest correlation (R = .68) seen so far, but it is still questionable that product quality can be predicted by inspection hit ratio. If PID A in the upper right corner is removed, the relationship (R = .56) is less conclusive.

A Different View

When these data were first presented at the 1996 Applications of Software Measurement Conference, Shari Lawrence-Pfleeger asked the author of this article if he had looked at the data individually, rather than the groupings by PID. He had not, primarily because the organization of the data and extraction tools made it relatively easy to view the data by PID, and extremely time consuming to look at each fix. A recent review of the data on a fix-by-fix basis revealed a slight trend indicating the longer it took to fix a problem, the higher the recidivism ratio.

SUMMARY: YEAR 1

It seems as if nothing is easy in software engineering. It would have been much easier if one or more of the analyses had shown a strong correlation, or if there had been a series of obvious outliers that would have guided corrective action. One might ask, “What did Bull gain by this effort?” The company learned that:

  1. Quality is maintained in spite of pressures to respond quickly.
  2. There is reinforcement that poor process leads to poor results.
  3. Further improvements will need to be broad-based process improvements, rather than fixing exception conditions. The company did find several indications that suggested a process changed (source roll frequency).
  4. This analysis sets a baseline for future comparisons.
  5. Response time seems to be under control across 12 PIDs, suggesting that resource allocation is managed fairly well.

THE NEXT YEARS

figure 7The analysis was repeated the next two years. Several of the more interesting observations are shown in Figure 7. One can see the recidivism ratios for the same set of PIDs as in the first year with the addition of PID M.

Product A was still a problem, however, it was replaced by a new PID (M) as the “worst behaved.” PID M was the result of a project with poor documentation and no inspections, creating a repeat nightmare. Unfortunately, the root causes for PID M problems were well established before the lessons of the previous year were available. The average recidivism ratio was up considerably, from 4.1 percent to 5.4 percent, a 31.7 percent increase. Since the STAR volume for the year was nearly double (reflecting the increased number of installations of the new release), this suggests the STAR volume has an influence on recidivism ratio. If the two worst PIDs are removed (A and M), however, the difference is 3.9 percent to 4.4 percent on a year-to-year comparison (12.8 percent). This suggests that controlling the volume of changes might improve the recidivism ratio.

LESSONS LEARNED, ACTIONS TAKEN, AND RESULTS

The lessons learned from the second year of data reinforce the first year of data, that low quality products are difficult to maintain and will have a high(er) defect ratio in repair than well-designed products. This is not an earth-shattering revelation, but sometimes one must have these data to prove the “pay me now or pay me later” rule holds for software just as well as for oil filters and auto engines. There were several significant changes in the processes (or enforcement of processes) in the following years.

First, the company realized the existing maintenance processes were not capable of better performance and instituted a major change in the testing process. EPs that were distributed beyond the site-specific correction were subjected to longer testing. This delayed the availability of the correction, but is in agreement with the recognized defect to failure ratio relationship published by Adams (1984). Second, the company reduced the frequency of EPs. By taking a more conservative approach to distributing corrections (again consistent with the Adams’ work, especially in the second or third year of corrective maintenance), it avoided taking site-specific corrections to a second site where they were not needed. This effectively reduced the recidivism ratio to zero. (Since the middle of 1997, eight or more months of the year see 0 percent recidivism ratio). The change was so significant that the company was not able to repeat the analysis in the third year. The number of defective fixes was so low that it was not possible (or necessary) to use the same analysis methods.

Second, the company strictly controlled the development processes on the next release. It introduced defect-depletion tracking across the full development life cycle. It developed quality goals for the next release, and predicted and measured conformance to these goals (Weller 2000). The stated objective of development was “to put the SMT out of business” by reducing the STAR rates to the point where the volume would not sustain a separate maintenance organization.

 

Cure Worse Than the Disease?


Quantifying a “defective fix ratio” from industry publications is an interesting exercise. Weinberg (1986) suggests fixes have defect ratios as high as 50 percent for one-line changes, increasing as size increases to about five lines (75 percent), and then decreasing to 35 percent as size grows to 20 lines. This article did not say when the numbers were measured. Are they the injection ratios before any defect removal activities, or are these numbers the customer-detected ratios? Capers Jones (1991) suggests the bad-fix injection ratio should be less than 2.5 percent (but does not say if this is the injection ratio or customer discovery ratio, although customer discovery ratio is implied). In 1994 an article in the IBM Systems Journal stated the measured corrective maintenance was “a fraction of 1 percent” defective fixes, with an expectation of a 0 percent ratio (Bencher 1994). As seen in Figure 6, Bull’s numbers varied from 0 percent to 36 percent depending on when and where the measurements were made. Beware of numbers without explanation or quantification!

 

THE PAYBACK

figure 8One measure of improvement from release to release is to measure the reported defects against the system-years of operation. This normalizes for different migration rates and number of systems. Figure 8 shows the comparison between the release that generated the data in this article with its follow-on release. The follow-on data were subdivided to show the contribution of the new products in the follow-on release. The follow-on release was as stable as its mature predecessor 12 months after initial exposure because the defect ratio of the new product was significantly better than its predecessor. The current ratio is about 40 times better, but actual exposure of the post-Y2K release is still limited. The company expects the final result to be in the 10-to-20-to-1 range. Note that all of these gains were not due to the results of this study, but several key changes, including the absolute need for design and inspection, were rigorously enforced on all products going into the follow-on release.

SUMMARY

What did Bull Information Systems learn? Obviously, the first lesson is “do it right the first time.” Looking strictly at the analysis of the SMT process, the company concluded the existing defect removal processes were delivering the best the company could expect, and that any improvements would require major changes to the process.

Are these results transferable to another company? To investigate the approach used to do this analysis, readers can try the following with their own data:

  • Find ways to divide data into repeating or related groups, whether time based, product identity, or other division.
  • Consider scatter plots and values by group (for example, see Figure 1).
  • Look at the outliers (best case/worst case).

If there are outliers, see if one can reconstruct their history. Look at both the good and bad to propagate the good processes as well as avoiding the bad (or nonexistent) processes.

A few warnings might include:

  • Do not be fooled by coincidental correlations. Just because the data look the way one expects or wants them to, do not accept it.
  • If a company has to maintain its products, be sure to document the designs, and inspect the designs and code. If responsible for product maintenance, refuse to accept a product that fails to meet these criteria, or allow sufficient budget to handle a high rate of problems (and repeat problems).
  • Look for distortions caused by a large subset of data from one project.
  • Do not ignore outliers, but do look at the data with outliers removed.
  • Ensure data sets have a large enough population.
  • Consider product age–are older products better or worse than newer products?
  • Consider the people who are doing the work–training, familiarity with product, and so on, but never blame the people for the problems.
  • Apply common sense!

ACKNOWLEDGMENTS

Without the detailed EP data collected by Jim Hensley, this study would not have been possible. I would also like to thank the many valuable comments from the reviewers.

Much of this work was presented in 1996 and 1997 at the Application of Software Measurement Conferences. The material was presented here with the benefit of four to five years of hindsight, and the results of some of the improvements made since then.

REFERENCES

Adams, E. 1984. Optimizing preventive service of software products. IBM Journal of Research and Development 28, no. 1.

Bencher, D. L. 1994. Programming quality improvement in IBM. IBM Systems Journal 33, no. 1: 218-219.

Jones, C. 1991. Applied software measurement. New York: McGraw-Hill.

Weinberg, G. 1986. Kill that code. IEEE Tutorial on Software Restructuring. Washington, D. C.: IEEE Computer Society Press.

Weller, E. F. 1993. Lessons from three years of inspection data. IEEE Software (September): 38-45.

–––. 1995. Applying statistical process control to software maintenance. In Proceedings of the Applications of Software Measurement Conference. Orange Park, Fla.: Software Quality Engineering.

–––. 1996. Managing software maintenance with metrics. In Proceedings of the Applications of Software Measurement Conference. Orange Park, Fla.: Software Quality Engineering.

–––. 2000. Practical applications of statistical process control. IEEE Software (May/June): 48-55.

BIOGRAPHY

Ed Weller is a Fellow at Bull HN Information Systems in Phoenix, Ariz., where he is responsible for the software processes used for their mainframe operating systems group. Prior to joining Bull HN, he was a technical staff engineer and manager of the Systems and Software Engineering Process Group at Motorola’s Satellite Communications Division. He is an authorized lead assessor in the Software Engineering Institute’s Appraiser Program for CMM-based Appraisals for Internal Process Improvement. Weller received the IEEE Software “Best Article of the Year” award for his September 1993 article, “Lessons From Three Years of Inspection Data,” and was awarded Best Track Presentation at the 1994 Applications of Software Measurement Conference for “Using Metrics to Manage Software Projects.”

Weller has more than 30 years of experience in hardware, test, software, and systems engineering of large-scale hardware and software projects and is a Senior member of IEEE. He has a bachelor’s degree in electrical engineering from the University of Michigan and a master’s degree from the Florida Institute of Technology. Weller can be reached at Bull Information Systems, Inc., 13430 N. Black Canyon, MS Z-68, Phoenix, AZ 85029 or by e-mail at Ed.Weller@Bull.com.

If you liked this article, subscribe now.