The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Statistical Software Engineering
Similar increases in size and complexity are expected in all consumer electronic products as increased functionality is introduced.
Central to this report's theme, and essential to statistical software engineering, is the role of data, the realm where opportunities lie and difficulties begin. The opportunities are clear: whenever data are used or can be generated in the software life cycle, statistical methods can be brought to bear for description, estimation, and prediction . This report highlights such areas and gives examples of how statistical methods have been and can be used.
Nevertheless, the major obstacle to applying statistical methods to software engineering is the lack of consistent, high-quality data in the resource-allocation, design, review, implementation, and test stages of software development. Statisticians interested in conducting research in software engineering must acknowledge this fact and play a leadership role in providing adequate grounds for the resources needed to acquire and maintain high-quality, relevant data. A statement by one of the forum participants, David Card, captures the serious problem that statisticians face in demonstrating the value of good data and good data analysis: "It may not be that effective to be able to rigorously demonstrate a 10% or 15% or 20% improvement (in quality or productivity) when with no data and no analysis, you can claim 50% or even 100%."
The cost of collecting and maintaining high-quality information to support software development is unfortunately high, but arguably essential—as the NASA case study presented in Chapter 2 makes clear. The panel conjectures that use of adequate metrics and data of good quality is, in general, the primary differentiator between successful, productive software development organizations and those that are struggling. Traditional manufacturers have learned the value of investing in an information system to support product development; software development organizations must take heed. All too often, as a release date approaches, all available resources are dedicated to moving a software product out the door, with the result that few or no resources are expended on collecting data during these crucial periods. Subsequent attempts at retrospective analysis—to help forecast costs for a new product or identify root causes of faults found during product testing—are inconclusive when speculation rather than hard data is all that is available to work with. But even software development organizations that realize the importance of historical data can get caught in a downward spiral: effort is expended on collection of data that initially are insufficient to support inferences. When data are not being used, efforts to maintain their quality decrease. But then when the data are needed, their quality is insufficient to allow drawing conclusions. The spiral has begun.
As one means of capturing valuable historical data, efforts are under way to create repositories of data on software development experiments and projects. There is much apprehension in the software engineering community that such data will not be helpful because the relevant metadata (data about the data) are not likely to be included. The panel shares this concern because the exclusion of metadata not only encourages sometimes thoughtless analyses, but also makes it too easy for statisticians to conduct isolated research in software engineering. The panel believes that truly collaborative research must be undertaken and that it must be done with a keen eye to solving the particular problems faced by the software industry. Nevertheless, the panel recognizes benefits to collecting data or experimentation in software development. As is pointed out in more detail in Chapter 5, one of the largest impacts the statistical community