Reliability Division
  Join | Change Membership
Search
Browse
Home
About
Announcements
Leadership
Links
Articles
Awards
Books & Publications
Tools
Discussion Boards
Calendar
Submit an Article
Members Only
Find a Member
Forum Library
Reliability Review
Tech Briefs
Career Services
RAMS Conference Proceedings
My Account
Update Your Profile
Change Membership
Get Involved in ASQ

Differing Views of Software Reliability

Samuel Keene, PhD

Background:

The author has developed and taught classes in "creativity", whereas his main technical specialty is software reliability. This report discusses the outcome of employing creative thought development to produce a richer or more total view of software reliability.

The process used is called "Stepping Stones," as coined by Edward deBono, the creative thinking expert. The approach is first get an idea presented before a group, and then let the group respond. The group will respond to the initial seed idea and also build off each other’s ideas. Each stepping stone has two values. First, it has its inherent value representing the quality of its thought as applied to the problem. That is the normal way ideas are rated. Beyond that value, each idea has a transitive value in that it triggers thoughts in the remainder of the group, i.e.; it is a catalyst to spawn other ideas.

The stepping stone approach is excellent in a brainstorming meeting. It takes the onus off the participants feeling like their suggestion has to be "correct" and not attackable. Every idea has transitive value. It puts participants at ease.

Overview:

The author solicited a rather unique view of software reliability from a distinguished colleague, who is mostly known for his work and books in other reliability specialties than software reliability. His thoughts are referred to as the "seed thought". This seed thought was broadcast, via email, to a dozen or more software reliability specialists. Nearly every recipient contributed their view to the idea progression. To get such participation is a rare and satisfying event. The views received are condensed and shown below.

Seed Thought

Since errors in software are ultimately caused by people making mistakes, there is no practical or useful way by which their existence can be predicted or quantified. Then, since software operation is not time dependent (there are no processes, like wear, fatigue, etc.), errors do not generate system failures in any usefully predictable or quantifiable "rate". Of course errors are made, and failures occur, but there can be no credible math "models" for them. Furthermore, they are all different, in severity, likelihood of being observed, etc., so any "model" that says "errors =" or "failure rate =" is as useful as saying that you have, say, 47 things in your shopping cart. Ariane 5 had one software error; it was not reliable!

Note that much of this argument also applies to hardware, particularly digital stuff. The Pentium had an error. So the "numbers game" has very little practical value, only in defined situations in which the underlying failure generating processes are well understood, like the life of a light bulb or timing belt. That is why I remain skeptical about the model you (Sam Keene) have developed, and presented in Reliability Review and at the Annual R&M Symposium. The values you assign to the criteria are pure guesses, and that is neither scientific nor engineering. Software does not fail since nothing breaks.

My position is that the laws of science must be the basis of sensible prediction or measurement/extrapolation of reliability. Of course perceptions must also be allowed for, as Rod says. Therefore, any "mathematical model" that says "lambda =" for an engineering item or for software, in any sense other than a statement of what has happened in the past, is nearly always nonsense. For example, how can anyone believe in a "model" that implies that the next product will not be better that the past/present? Does Boeing not try to improve reliability from generation to generation? The "models" imply that we are doomed to repeating our mistakes, or making just as many new ones. We can plan to make fewer mistakes, but we cannot change the density of aluminum.

Responses To Seed Thoughts

The individual responses are presented below. They have been grouped by topic since several respondents focused on one or two of the premises delineated in the seed thought paragraphs. Some respondents provided extensive and broad commentary; we will include a portion of their discussion in this installment and the balance in a second installment. The large amount of material presented by the participants was more than we could accommodate in one installment.

A. On Failure Modes, Definitions, and Effects

Response 2. System Behavior Not To Requirements = Failure

Re the seed thought, I generally do not agree with him. Since when does a system have to change state to fail? A failure is any system behavior that does not meet user requirements. Individual human errors cannot be predicted, but the total group of errors can be modeled statistically. Software operation is time dependent because the input states presented to software are time dependent. As far as practicality is concerned, there are certainly a lot of people who are finding software reliability engineering useful; otherwise it would not be such a rapidly growing field.

Response 4. Failure Defined

Put succinctly: In engineering, we are not interested in producing either hardware or software. We produce FUNCTION by any means available. And clearly function can fail, because it has purpose. Anything for which there is not a human-defined purpose does not fail, because failure is human-defined. Contrariwise, whenever we define purpose we define its inverse, failure.

Thus, neither hardware nor software fails until we have a purpose for it and set it into operation. Inert software does not fail, it merely has design (or requirements) errors which are potential failures when the software is activated. Similarly, inactive hardware does not fail, it merely has conditions - something broken, which will manifest as failure when it is put into purposeful operation.

Failure is not a state of matter such as something broken. It is a state related to intention. A blown fuse is probably not failed, but has done what it was supposed to do. An intact fuse may have failed. An unexploded (intact) bomb has failed. An exploded (broken) bomb has not failed.

The problem of measuring failure rates in software (or other non-hardware parts of the system) is serious, and of prediction even more serious. That does not mean that we are not talking about failure. It only means we have a serious problem.

Response 3. Failure Is Observed Only in Execution Of Faulty Code

The failure of a software system is dependent only on what the software is currently doing: its functionality. If a program is currently executing a functionality that is expressed in terms of a set of fault free modules, this functionality will certainly execute indefinitely without any likelihood of failure. A program may execute a sequence of fault prone modules and still not fail. In this particular case, the faults may lie in a region of the code that is not likely to be expressed during the execution of a function. A failure event can only occur when the software system executes a module that contains faults. If a functionality is never selected that drives the program into a module that contains faults, then the program will never fail. Alternatively, a program may well execute successfully in a module that contains faults just as long as the faults are in code subsets that are not executed.

Response 5. Erroneous Performance = Failure

I hold the position that "an act of not functioning to expectations" is a failure. In fact, the code did produce an error or an unanticipated event. The statement "software doesn't fail since it does not change state: nothing breaks" is true, but an unacceptable output from the code is generated. Software error rate or another name for the anomaly could be determined, but I'm not sure of the added value.

Response 6. Deviation From Requirement = Failure

Maybe software has overstepped its bounds in defining a "failure" to mean "deviation from required behavior." Though I would submit that this is a reasonable definition, since in any complex system a failure may occur unexpectedly thus mimicking hardware failures, even if we were to pick another word it is still a useful *concept.* To me this argument is bandying semantics of whether the word "failure" is appropriate.

As far as Ariane 5 goes, the software had, no doubt, *many* errors. It had at least one significant failure that was caused by an error that happened to occur at an unpredicted time. It failed; that's a "failure" in my book. It failed catastrophically. And it gave every indication that the failure would occur every time the mission was launched. Rather a low MTTF I would say.

The seed thought is an interesting view on software reliability and software failures. The assertion that software doesn't fail since it does not change state: nothing breaks. This is a static view of the software. There are faults and defects in the software and they only become failures once they are executed with the appropriate boundary conditions. The seed thought essentially asserts the term "software failure" is an oxymoron.

Software failures are real and they are time dependent. Memory leaks grow with time and can eventually cause a system to crash. Roundoff errors build and Queues overflow. Software Rejuvenation (restarting the software or rebooting the system) can limit the exposure to these failures. John Musa has done a lot of work on Software Reliability models and his results hold up. Reliability is predictive.

Response 7. Software Failure Manifestations

Although I agree that a single line of software cannot fail, I think a software program (group of lines of code) can. A software program "fails" when it changes state from "executing" to "halted". Less severe problems, such as producing an incorrect result, might also be interpreted by a user as a failure. For example, suppose one wished to change the color of text in a spreadsheet cell. If the software program crashed when this was attempted, I think most people would agree that it had "failed". On the other hand, if the text was changed to the right color but not quite as bright as requested, many people might not notice. If the text was changed to the wrong color, I think most people would again say that it had "failed". This might be compared to a power supply whose output voltage was zero ("failed") or just slightly out of tolerance (probably not noticed).

B. On Prediction of Error Rate/Failure Rate

Response 7. Human Performance Traits Predictable

Regarding predicting software errors I agree that it is difficult, but necessary, since software program failures cause a significant amount of system failures. For a single programmer, it seems reasonable to attempt to predict the number of errors in a new software program based on the number found in a previous one. For multiple programmers, it is their coordinated ability which is important. One way to quantify this is by evaluating the software engineering process. I think this forms the basis for most "prediction" models.

Once a software program is in use, its failure rate depends on usage but it can still be measured. For my text color example, an engineer might never try to change the color of text in a spreadsheet cell and therefore report no failures. An accountant, however, would probably consider it a failure if his spreadsheet program turned negative numbers green instead of red. If the software and users remained the same, the failure rate might be constant, but users come and go, and the software changes to fix failures reported. Consequently, estimating the software program failure rate is also difficult, but necessary, because it often accounts for a significant amount of the system failure rate.

Response 9. Complexity, Bugs and Code Path Useage

I suppose the seed thought is right but this sounds like a semantics game to me. When the product fails to perform its required and designed function it is a failure. It’s not due to wear-out or breakage but instead design. I disagree with the statement about predictability. The complexity of a code design as measured by module size (# of lines), number of code paths, and nesting levels has been correlated to number of software bugs. See Practical Software Metrics for Project Management and Process Improvement by Grady (approximately pages 70-100). Specifically measures of complexity are cyclomatic complexity. There are charts on defect density vs. complexity, etc. The likely hood of being observed could also be related to code path usage, which can be measured for different applications of the product.

Response 9. Logarithmic Model Predicts Acceptably

I don't necessarily agree with the seed assertion that "since errors in software are ultimately caused by people making mistakes, there is no practical or useful way by which their existence can be predicted or quantified."

All companies have some process that drives software development. Some may be better defined or better followed than others but there is always some underlying process. Additionally, those responsible for software development tend to move from project to project rather than being replaced every time their company wants to develop an enhancement or the next "ground-breaking" product. Because of these two facts there is, in-fact, a predictability to the code that they develop.

If the seed thought meant to say was that it's impossible to apply a textbook model directly to any software product and accurately determine its expected failure rate, I'd have to agree. But we don't even do that with hardware. We have spent considerable time refining the 217 model through manipulation of part family PI-Q's to obtain a 15% correlation between predicted and field actuals. The same approach must be taken with the software models to obtain any accuracy.

We examine fault introduction/removal rates and their relationship to software failure rate for previous versions of our disk products. Many of the same people, using the same development processes, are developing our enhancements and follow-on products. Against this background, we've found that a modification of Musa's Logarithmic Model has worked well to predict the performance of those products. Reliability practitioners need to find the model fitting their data the best (or develop their own) and continually refine it to obtain the desired correlation.

C. Amount and Duration of Useage; Wear, Degradation

Response 1. Changes During Operation Analogous To Wear

Despite the fact that the software doesn't experience wear, the environment that it operates within does experience many changes analogous to wear. I have observed the effects of types of "wear" in software-intensive systems. Examples are:

  • Duration-related memory leaks and fragmentation can degrade execution time, often causing reduced throughput and eventual system failure.
  • Changes in the operating system versions or separate subsystems can affect performance and reliability. The main program has not changed but the environment around it has.
  • As code ages and is patched with changes, its entropy and complexity increases making it more failure prone.
  • Hardware degradation which affects system performance can be experienced many ways. Increased error rate on disk drives which are old or dirty; cables and connectors which become intermittent or noisy; etc. These examples are commonplace and have very significant reliability impacts.

Response 3. Change Occurs Over Time/Users

Some of the problems that have arisen in past attempts at software reliability determination all relate to the fact that their perspective has been distorted. Programs do not wear out over time. If they are not modified, they will certainly not improve over time. Nor will they get less reliable over time. The only thing that really impacts the reliability of a software system is its functionality. A program may work very well for a number of years based on the functions that it is asked to execute. This same program may suddenly become quite unreliable if its mission is changed by the user.

By keeping track of the state transitions from module to module and function to function we may learn exactly where a system is fragile. This information coupled with the functional profile will tell us just how reliable the system will be when we use it as specified. Programs make transitions from module to module as they execute. These transitions may be observed. Transitions to program modules that are fault laden will result in an increased probability of failure. We can model these transitions as a stochastic process. Ultimately, by developing a mathematical description for the behavior of the software as it transitions from one module to another driven by the functionalities that it is performing, we can describe the reliability of the functionality. The software system is the sum of its functionalities. If we can know the reliability of the functionalities and how the system apportions its time among these functionalities, we can then know the reliability of the system.

D. Modeling Software Reliability

Response 10. Modeling Done Well Meets Needs

The seed thought presents a very limited view of modeling and mathematics. The modeling done on fatigue/break is not the same as modeling software faults and failures. however, any model by which we gain insight has some validity. Just because I can only predict reasonably well that there will be a certain "number" of failures, but can not predict precisely where the fault causing a failure will occur does not mean that the "number" is not of interest. Furthermore, I am able (in properly designed cases) to partition the software so as to gain some understanding of the predicted number of failures for each part (e.g., for safety/security-critical kernels vs other parts). Thus, I can even narrow the likelihood of "where" faults may be found. In reality, even the wearout failure of a hardware part can not be predicted at the precise spot of failure.

Software failure (defined as a failure that is the result of a fault in software) is clearly NOT an oxymoron. If there is at least one example where software has had a fault that caused a failure, then the definition of software failure is valid. Whether your colleague believes any particular model of a particular software's failure rate is practical or useful -- to him -- is another consideration.

The key to scientific endeavor is to investigate and try to understand observable data. We do observe system failures due to software. The scientific endeavor then is to understand many things about that failure -- all of which are of interest to the discipline of software reliability, which is loosely the investigation of defects, faults, failures related to software. To understand why people make mistakes, how to prevent such mistakes, how to recognize the manifestation of such mistakes as faults in the software, how to use processes that reduce such mistakes and faults, and to measure the observable effects of reduced "software failures" during the operational use of the software is of value –practical and useful.

Response 12

Let me add some short comments to the seed position "We cannot usefully "model" such events with mathematical formulae.

  • Many have used such models to plan relevant action like how much more testing to do, how many people to staff for maintenance after software release. The models developed by John Musa, have been successful. Also there is nothing unique in how AT&T develops software or its assumptions are made that would NOT make them applicable to other software application domains.
  • Modeling of failures in software is done in processing (execution) time and there are good explanations for using such a time measure. Such an execution time measure is application independent as wise software engineers appreciate.
  • Some software reliability models can be explained in terms of software failure mechanisms (eg. Musa's Basic Execution Time model and Everett's Extended Execution Time Model). The parameters of these models can be explained in terms of characteristics of software and processing patterns to which it is subjected. Using such models, one can estimate model parameters a priori (before software execution) from properties of the software and processing patterns.
  • As I said earlier, don't limit yourself to physical laws. It is at least equally important to understand the limitations of the applicability of mathematical models. I agree that we don't necessarily need more sophisticated mathematics for a more mature software reliability engineering discipline, we do need a better understanding of software, its failure mechanisms and how they are triggered.
  • The pursuit in hardware and software reliability engineering is not finding the "model", its understanding the phenomenom we observe, striving to quantify what we observe in a way that explains that understanding. There are those of us who say we are closer to doing so than you believe.

Conclusion

The stepping stones approach successfully triggered a plethora of response defining different views of software reliability modeling. Hopefully, this has helped the readers to see the subject more broadly. All of these views are models of the problem. A statement attributed to George Box, former president of the ASA, seems apropos here. That is, "all models are wrong, some are useful". I hope the readers will find use in these views and in the stratagem of the stepping stones approach to getting ideas out.

Site Map | Contact ASQ | Privacy | © Copyright