8
Computer Hardware and Software for the Generation of Virtual Environments

The computer technology that allows us to develop three-dimensional virtual environments (VEs) consists of both hardware and software. The current popular, technical, and scientific interest in VEs is inspired, in large part, by the advent and availability of increasingly powerful and affordable visually oriented, interactive, graphical display systems and techniques. Graphical image generation and display capabilities that were not previously widely available are now found on the desktops of many professionals and are finding their way into the home. The greater affordability and availability of these systems, coupled with more capable, single-person-oriented viewing and control devices (e.g., head-mounted displays and hand-controllers) and an increased orientation toward real-time interaction, have made these systems both more capable of being individualized and more appealing to individuals.

Limiting VE technology to primarily visual interactions, however, simply defines the technology as a more personal and affordable variant of classical military and commercial graphical simulation technology. A much more interesting, and potentially useful, way to view VEs is as a significant subset of multimodal user interfaces. Multimodal user interfaces are simply human-machine interfaces that actively or purposefully use interaction and display techniques in multiple sensory modalities (e.g., visual, haptic, and auditory). In this sense, VEs can be viewed as multimodal user interfaces that are interactive and spatially oriented. The human-machine interface hardware that includes visual and auditory displays as well as tracking and haptic interface devices is covered in Chapters



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges 8 Computer Hardware and Software for the Generation of Virtual Environments The computer technology that allows us to develop three-dimensional virtual environments (VEs) consists of both hardware and software. The current popular, technical, and scientific interest in VEs is inspired, in large part, by the advent and availability of increasingly powerful and affordable visually oriented, interactive, graphical display systems and techniques. Graphical image generation and display capabilities that were not previously widely available are now found on the desktops of many professionals and are finding their way into the home. The greater affordability and availability of these systems, coupled with more capable, single-person-oriented viewing and control devices (e.g., head-mounted displays and hand-controllers) and an increased orientation toward real-time interaction, have made these systems both more capable of being individualized and more appealing to individuals. Limiting VE technology to primarily visual interactions, however, simply defines the technology as a more personal and affordable variant of classical military and commercial graphical simulation technology. A much more interesting, and potentially useful, way to view VEs is as a significant subset of multimodal user interfaces. Multimodal user interfaces are simply human-machine interfaces that actively or purposefully use interaction and display techniques in multiple sensory modalities (e.g., visual, haptic, and auditory). In this sense, VEs can be viewed as multimodal user interfaces that are interactive and spatially oriented. The human-machine interface hardware that includes visual and auditory displays as well as tracking and haptic interface devices is covered in Chapters

OCR for page 247
Virtual Reality: Scientific and Technological Challenges 3, 4, and 5. In this chapter, we focus on the computer technology for the generation of VEs. One possible organization of the computer technology for VEs is to decompose it into functional blocks. In Figure 8-1, three distinct classes of blocks are shown: (1) rendering hardware and software for driving modality-specific display devices; (2) hardware and software for modality-specific aspects of models and the generation of corresponding display representations; (3) the core hardware and software in which modality-independent aspects of models as well as consistency and registration among multimodal models are taken into consideration. Beginning from left to right, human sensorimotor systems, such as eyes, ears, touch, and speech, are connected to the computer through human-machine interface devices. These devices generate output to, or receive input from, the human as a function of sensory modal drivers or renderers. The auditory display driver, for example, generates an appropriate waveform based on an acoustic simulation of the VE. To generate the sensory output, a computer must simulate the VE for that particular sensory mode. For example, a haptic display may require a physical simulation that includes FIGURE 8-1 Organization of the computer technology for virtual reality.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges compliance and texture. An acoustic display may require sound models based on impact, vibration, friction, fluid flow, etc. Each sensory modality requires a simulation tailored to its particular output. Next, a unified representation is necessary to coordinate individual sensory models and to synchronize output for each sensory driver. This representation must account for all human participants in the VE, as well as all autonomous internal entities. Finally, gathered and computed information must be summarized and broadcast over the network in order to maintain a consistent distributed simulated environment. To date much of the design emphasis in VE systems has been dictated by the constraints imposed by generating the visual scene. The nonvisual modalities have been relegated to special-purpose peripheral devices. Similarly, this chapter is primarily concerned with the visual domain, and material on other modalities can be found in Chapters 3-7. However, many of the issues involved in the modeling and generation of acoustic and haptic images are similar to the visual domain; the implementation requirements for interacting, navigating, and communicating in a virtual world are common to all modalities. Such multimodal issues will no doubt tend to be merged into a more unitary computational system as the technology advances over time. In this section, we focus on the computer technology for the generation of VEs. The computer hardware used to develop three-dimensional VEs includes high-performance workstations with special components for multisensory displays, parallel processors for the rapid computation of world models, and high-speed computer networks for transferring information among participants in the VE. The implementation of the virtual world is accomplished with software for interaction, navigation, modeling (geometric, physical, and behavioral), communication, and hypermedia integration. Control devices and head-mounted displays are covered elsewhere in this report. VE requires high frame rates and fast response because of its inherently interactive nature. The concept of frame rate comes from motion picture technology. In a motion picture presentation, each frame is really a still photograph. If a new photograph replaces the older images in quick succession, the illusion of motion in engendered. The update rate is defined to be the rate at which display changes are made and shown on the screen. In keeping with the original motion picture technology, the ideal update rate is 20 frames (new pictures) per second or higher. The minimum acceptable rate for VE is lower, reflecting the trade-offs between cost and such tolerances. With regard to computer hardware, there are several senses of frame rate: they are roughly classified as graphical, computational, and data access. Graphical frame rates are critical in order to sustain the illusion of presence

OCR for page 247
Virtual Reality: Scientific and Technological Challenges or immersion in a VE. Note that these frame rates may be independent: the graphical scene may change without a new computation and data access due to the motion of the user's point of view. Experience has shown that, whereas the graphical frame rate should be as high as possible, frame rates of lower than 10 frames per second severely degrade the illusion of presence. If the graphics being displayed relies on computation or data access, then computation and data access frame rates of 8 to 10 frames per second are necessary to sustain the visual illusion that the user is watching the time evolution of the VE. Fast response times are required if the application allows interactive control. It is well known (Sheridan and Ferrell, 1974) that long response times (also called lag or pure delay) severely degrade user performance. These delays arise in the computer system from such factors as data access time, computation time, and rendering time, as well as from delays in processing data from the input devices. As in the case of frame rates, the sources of delay are classified into data access, computation, and graphical categories. Although delays are clearly related to frame rates, they are not the same: a system may have a high frame rate, but the image being displayed or the computational result being presented may be several frames old. Research has shown that delays of longer than a few milliseconds can measurably impact user performance, whereas delays of longer than a tenth of a second can have a severe impact. The frame rate and delay required to create a measurable impact will in general depend on the nature of the environment. Relatively static environments with slowly moving objects are usable with frame rates as low as 8 to 10 per s and delays of up to 0.1 s. Environments with objects exhibiting high frequencies of motion (such as a virtual handball game) will require very high frame rates (> 60 Hz) and very short delays. In all cases, however, if the frame rate falls below 8 frames per s, the sense of an animated three-dimensional environment begins to fail, and if delays become greater than 0.1 s, manipulation of the environment becomes very difficult. We summarize these results to the following constraints on the performance of a VE system: Frame rates must be greater than 8 to 10 frames/s. Total delay must be less than 0.1 s. Both the graphics animation and the reaction of the environment to user actions require extensive data management, computation, graphics, and network resources. All operations that take place to support the environment must operate within the above time constraints. Although one can imagine a system that would have the graphics, computation, and communications capability to handle all environments, such a system is beyond current technology. For a long time to come, the technology necessary

OCR for page 247
Virtual Reality: Scientific and Technological Challenges will generally be dependent on the application domain for which the VE is being built. Real-world simulation applications will be highly bound by the graphics and network protocols and by consistency issues; information visualization and scientific visualization applications will be bound by the computational performance and will involve issues of massive data management (Bryson and Levit, 1992; Ellis et al., 1991). Some applications, such as architectural visualization, will require photorealistic rendering; others, such as information display, will not. Thus the particular hardware and software required for VE implementation will depend on the application domain targeted. There are some commonalities of hardware and software requirements, and it is those commonalities on which we focus in our examination of the state of the art of computer hardware and software for the construction of real-time, three-dimensional virtual environments. HARDWARE FOR COMPUTER GRAPHICS The ubiquity of computer graphics workstations capable of real-time, three-dimensional display at high frame rates is probably the key development behind the current push for VEs today. We have had flight simulators with significant graphics capability for years, but they have been expensive and not widely available. Even worse, they have not been readily programmable. Flight simulators are generally constructed with a specific purpose in mind, such as providing training for a particular military plane. Such simulators are microcoded and programmed in assembly language to reduce the total number of graphics and central processing unit cycles required. Systems programmed in this manner are difficult to change and maintain. Hardware upgrades for such systems are usually major undertakings with a small customer base. An even larger problem is that the software and hardware developed for such systems are generally proprietary, thus limiting the availability of the technology. The graphics workstation in the last 5 years has begun to supplant the special-purpose hardware of the flight simulator, and it has provided an entry pathway to the large numbers of people interested in developing three-dimensional VEs. The following section is a survey of computer graphics workstations and graphics hardware that are part of the VE development effort. Notable Graphics Workstations and Graphics Hardware Graphics performance is difficult to measure because of the widely varying complexity of visual scenes and the different hardware and software approaches to computing and displaying visual imagery. The most

OCR for page 247
Virtual Reality: Scientific and Technological Challenges straightforward measure is given in terms of polygons/second, but this only gives a crude indication of the scene complexity that can be displayed at useful interactive update rates. Polygons are the most common building blocks for creating a graphic image. It has been said that visual reality is 80 million polygons per picture (Catmull et al., 1984). If we wish photorealistic VEs at 10 frames/s, this translates into 800 million polygons/s. There is no current graphics hardware that provides this, so we must make approximations at the moment. This means living with less detailed virtual worlds, perhaps via judicious use of hierarchical data structures (see the software section below) or off-loading some of the graphics requirements by utilizing available CPU resources instead. For the foreseeable future, multiple processor workstations will be playing a role in off-loading graphics processing. Moreover, the world modeling components, the communications components, and the other software components for creating virtual worlds also require significant CPU capacity. While we focus on graphics initially, it is important to note that it is the way world modeling effects picture change that is of ultimate importance. Graphics Architectures for VE Rendering This section describes the high-level computer architecture issues that determine the applicability of a graphics system to VE rendering. Two assumptions are made about the systems included in our discussion. First, they use a z-buffer (or depth buffer), for hidden surface elimination. A z-buffer stores the depth—or distance from the eye point—of the closest surface ''seen" at that pixel. When a new surface is scan converted, the depth at each pixel is computed. If the new depth at a given pixel is closer to the eye point than the depth currently stored in the z-buffer at that pixel, then the new depth and intensity information are written into both the z-buffer and the frame buffer. Otherwise, the new information is discarded and the next pixel is examined. In this way, nearer objects always overwrite more distant objects, and when every object has been scan converted, all surfaces have been correctly ordered in depth. The second assumption for these graphic systems is that they use an application-programmable, general-purpose processor to cull the database. The result is to provide the rendering hardware with only the graphics primitives that are within the viewing volume (a perspective pyramid or parallel piped for perspective and parallel projections respectively). Both of these assumptions are valid for commercial graphics workstations and for the systems that have been designed by researchers at the University of North Carolina at Chapel Hill. The rendering operation is composed of three stages: per-primitive,

OCR for page 247
Virtual Reality: Scientific and Technological Challenges FIGURE 8-2 The graphics pipeline. rasterization, and per-fragment (as shown in Figure 8-2). Per-primitive operations are those that are performed on the points, lines, and triangles that are presented to the rendering system. These include transformation of vertices from object coordinates to world, eye, view volume, and eventually to window coordinates, lighting calculations at each vertex, and clipping to the visible viewing volume. Rasterization is the process of converting the window-coordinate primitives to fragments corresponding to the pixels held in the frame buffer. The frame buffer is a dedicated block of memory that holds intensity and other information for every pixel on the display surface. The frame buffer is scanned repeatedly by the display hardware to generate visual imagery. Each of the fragments includes x and y window coordinates, a color, and a depth for use with the z-buffer for hidden surface elimination. Finally, per-fragment operations include comparing the fragment's depth value to the value stored in the z-buffer and, if the comparison is successful, replacing the color and depth values in the frame buffer with the fragment's values. The performance demanded of such a system can be substantial: 1 million triangles per second or hundreds of millions of fragments per second. The calculations involved in performing this work easily require billions of operations per second. Since none of today's fastest general purpose processors can satisfy these demands, all modern high-performance graphics systems are run on parallel architectures. Figure 8-3 is a general representation of a parallel architecture, in which the rendering operation of Figure 8-2 is simply replicated. Whereas such an architecture is attractively simple to implement, it fails to solve the rendering problem, because primitives in object coordinates cannot be easily separated into groups corresponding to different subregions of the frame buffer. There is in general a many-to-many mapping between the primitives in object coordinates and the partitions of the frame buffer. To allow for this many-to-many mapping, disjoint parallel rendering pipes must be combined at a minimum of one point along their paths, and this point must come after the per-primitive operations are completed. The point or crossbar can be located prior to the rasterization (the primitive crossbar), between rasterization and per-fragment (the fragment

OCR for page 247
Virtual Reality: Scientific and Technological Challenges FIGURE 8-3 Parallel graphics pipelines. crossbar), and following pixel merge (the pixel merge crossbar). A detailed discussion of these architectures is provided in the technical appendix to this chapter. There are four major graphics systems that represent different architectures based on crossbar location. Silicon Graphics RealityEngine is a flow-through architecture with a primitive crossbar; the Freedom series from Evans & Sutherland is a flow-through architecture with a fragment crossbar; Pixel Planes 5 uses a tiled primitive crossbar; and PixelFlow is a tiled, pixel merge machine. Ordered rendering has been presented to help clarify a significant distinction in graphics architectures; however, it is not the only significant factor for VE rendering. Other primary issues for VE rendering are image quality, performance, and latency. Measured by these metrics, RealityEngine and PixelFlow are very effective VE machines architecturally. Freedom and Pixel Planes 5 are less suitable, though still useful. Computation and Data Management Issues in Visual Scene Generation Many important applications of VE require extensive computational and data management capabilities. The computations and data in the application primarily support the tasks taking place in the application. For example, in simulation, the computations may support the physical behavior of objects in the VE, while in a visualization application the computations may support the extraction of interesting features from a complex precomputed dataset. Such computations may require on the order of millions of floating point operations. Simulations currently demand

OCR for page 247
Virtual Reality: Scientific and Technological Challenges only modest data management capabilities but, as the complexity of simulations increases, the data supporting them may increase. Visualization applications, in contrast, often demand a priori unpredictable access to gigabytes of data (Bryson and Gerald-Yamasaki, 1992). Other types of applications can have similar demands. As computer power increases, more ambitious computational demands will be made. For example, an application may someday compute a fluid flow solution in real time to high accuracy. Such computations can require trillions of floating point operations. An Example: The Virtual Wind Tunnel In this section, we consider the implications of the VE performance constraints on the computation and data management requirements of a VE system. An example of an application that is both computationally intensive and works with large numbers of data is the virtual wind tunnel (Bryson and Gerald-Yamasaki, 1992). A modest modern problem in the virtual wind tunnel is the visualization of a precomputed dataset that gives five values (one for energy, one for density, and three for the velocity vector) at 3 million points at a time, for 106 times. This dataset is a total of 5.3 Gbytes in size, with each time step being about 50 Mbytes. If the virtual wind tunnel is to allow the user to interactively control the time-varying visualization of this dataset, each time step must be loaded, and the visualizations must be computed. Assuming that 10 time steps must be loaded per second, a data bandwidth of 500 Mbytes per second is required. The computations involved depend on the visualization technique. For example, the velocity vector field can be visualized by releasing simulated particles into the flow, which implies a computation requiring about 200 floating point operations per particle per time step. A typical visualization requires thousands of such particles and hundreds of thousands of floating point operations. The computation problem expands further as such visualizations are combined with other computationally intensive visualization techniques, such as the display of isosurfaces. It is important to stress that this example is only of modest size, with the size and complexity of datasets doubling every year or so. It is quite difficult to simultaneously meet the VE performance constraints and the data management requirements in the above example. There are two aspects to the data management problem: (1) the time required to find the data in a mass storage device (seek time), which results in delays, and (2) the time required to read the data (bandwidth). The seek time can range from minutes in the case of data stored on tape through a few hundred thousandths of a second in the case of data stored on disk, to essentially nothing for data stored in primary memory. Bandwidths

OCR for page 247
Virtual Reality: Scientific and Technological Challenges range from a few megabytes per second in the case of tapes and disk to on the order of a hundred megabytes per second for RAID disks and physical memory. Disk bandwidth is not expected to improve significantly over the next few years. Support is needed to meet the requirements of VE applications for real-time random access to as much as several gigabytes (Bryson and Gerald-Yamasaki, 1992). Whereas for some visualization techniques, only a small number of data will be addressed at a time, a very large number of such accesses may be required for data that are scattered over the file on disk. Thus the seek time of the disk head becomes an important issue. For other visualization techniques (such as isosurfaces or volume rendering), many tens of megabytes of data may be needed for a single computation. This implies disk bandwidths of 300 to 500 Mbytes/s in order to maintain a 10 Hz update rate, an order of magnitude beyond current commercial systems. For these types of applications, physical memory is the only viable storage medium for data used in the environment. Workstations are currently being released with as much as 16 Gbytes of memory, but the costs of such large amounts of memory are currently prohibitive. Furthermore, as computational science grows through the increase in supercomputer power, datasets will dramatically increase in size. Another source of large datasets will be the Earth Observing Satellite, which will produce datasets in the terabyte range. This large number of data mandates very fast massive storage devices as a necessary technology for the application of VEs to these problems. Strategies for Meeting Requirements One strategy of meeting the data management requirements is to observe that, typically, only a small fraction of the data is actually used in an application. In the above particle injection example, only 16 accesses are required (with each access loading a few tens of bytes) per particle per time step. These accesses are scattered across the dataset in unpredictable ways. The bandwidth requirements of this example are trivial if only the data actually used are loaded, but the seek time requirements are a problem: 20,000 particles would require 320,000 seeks per time step or 3.2 million seeks per second. This is two orders of magnitude beyond the seek time capabilities of current disk systems. Another way to address the data size problem is to develop data compression algorithms. The data will be decompressed as they are used, trading off reduced data size for greater computational demands. Different application domains will make different demands of compression algorithms: image data allow "lossy" compression, in which the decompressed data will be of a slightly lower fidelity than the original; scientific

OCR for page 247
Virtual Reality: Scientific and Technological Challenges data cannot allow lossy compression (as this would introduce incorrect artifacts into the data) but will perhaps allow multiresolution compression algorithms, such as wavelet techniques. The development of appropriate data compression techniques for many application domains is an open area of research. Another strategy is to put as much of the dataset as possible in physical memory. This minimizes the seek time but restricts the number of data that may be investigated. This restriction will be relieved as workstation memories increase (see Figure 8-4). Datasets, however, are expected to grow radically as the available computational power increases. Computational requirements can be similarly difficult to meet. The above example of injecting 20,000 particles into a flow requires 4 million floating point operations, implying a computational performance of 40 million floating point operations per second (or 40 Mflops) just to compute the particle visualization. Such an application will often use several such visualizations simultaneously. As more computational power becomes available, we may wish to include partial differential equation solvers, increasing the computational requirements by several orders of magnitude. There are many ways in which supercomputer systems have attained very high computational speeds, but these methods typically work only for special computations. For example, Cray supercomputers rely on a vectorized architecture, which is very fast for array-type operations but is FIGURE 8-4 History of workstation computation and memory.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges by industry. However, it seems likely that, once the novelty wears off, industry interest will wane. Thus it is unlikely that the private sector will take on long-term development efforts in the absence of standards. Nevertheless, high-level interface issues should be explored. Specifically, research should be performed to examine how to use data measuring the positions of the user's body for interaction with the VE in a way that truly provides the richness of real-world interaction. Critical concerns are how to apply user tracking data and how to define objects in VE to ensure natural interaction. One of the major research challenges that has both hardware and software implications is the continued use of the RS-232C interface for control devices. Current workstation technology typically provides one or two such ports. Control devices are usually attached to these ports, with commands sent via the UNIX write system call. There is a speed limitation on the use of these ports, a limitation often seen as latency in input response. It is not uncommon to hear 70 ms touted as the fastest response from the time of input device movement to the reporting of the change back to the application running on the workstation. That 70 ms is too long a delay for real-time interaction, for which a maximum of 10 ms is more appropriate. And there is the additional problem with UNIX system software layers that must be traversed for events to be reported back to the concerned VE application. Current workstation manufacturers do not focus on the design of such high-speed ports. Even within one manufacturer there is no guarantee that such ports will behave consistently across differing models of workstations. Real standards and highly engineered ports are needed for control devices. In fact, a revolutionary redesign and restandardization of the input port is required if control devices are to take off. In addition, we need to rethink the layers of VE system architecture. Visual Scene Navigation Software Given the current workstation graphics polygon filling capabilities and the extrapolation of those speeds into the future, software solutions will be needed to reduce the total number of graphics primitives sent through the graphics pipeline for some time to come. The difficulty of polygon flow minimization depends on the composition of the virtual world. This problem has historically been approached on an application-specific basis, and there is as yet no general solution. Current solutions usually involve partitioning the polygon-defined world into volumes that can readily be checked for visibility by the virtual world viewer. There are many partitioning schemes—some of which work only if the world description does not change dynamically. We need to encourage research

OCR for page 247
Virtual Reality: Scientific and Technological Challenges into generalizing this software so that dynamically changing worlds can be constructed. Furthermore, there is a need to encourage the funding of research to reach a common, open solution for polygon flow minimization. Current researchers who have tackled polygon flow minimization have closely guarded their developed code. In fact, most software source code developed under university research contract today in the United States is held as proprietary by the universities, even if that code was developed under government contract. This fact, coupled with the stated goal of federal agencies of recouping investments, is counterproductive and disturbing. The unavailability of such software increases the overall development time and cost of progress in technology, as researchers duplicate software development. These redevelopment efforts also slow the progress of new development. There are additional technical issues in polygon flow minimization that are important. One of these issues, the generation of multiple resolution three-dimensional icons, is a closely related technological challenge. In much of the work of polygon flow minimization, it is assumed that multiple resolutions, lower polygon count, and three-dimensional icons are available. This assumption is a large one, with automatic methods for the generation of multiple-resolution three-dimensional icons an open issue. There is some work in this area, and it is recommended that a small research program be developed to encourage more (DeHaemer and Zyda, 1991; Schroeder et al., 1992; Turk, 1992). In fact, the development of such public software and a public domain set of three-dimensional clip models with geometry and associated behavior could go a long way toward encouraging the creation of three-dimensional VEs. Modeling Simulation Frameworks Research into the development of environments in which object behavior as well as object appearance can rapidly be specified is an area that needs further work. We call this area simulation frameworks. Such a framework makes no assumptions about the actual behavior (just as graphics systems currently make no assumptions about the appearance of graphical objects). A good term for what a simulation framework is trying to accomplish is meta-modeling. Such frameworks would facilitate the sharing of objects between environments and allow the establishment of object libraries. Issues to be researched include the representation of object behavior and how different behaviors are to be integrated into a single system. Geometric Modeling Because geometric modeling is integral to the construction of VEs, its current limitations serve as limits to development. As

OCR for page 247
Virtual Reality: Scientific and Technological Challenges a practical matter, the VE research community needs a shared open modeling environment that includes physical and behavioral modeling. The current state of the art in VE technology is to use available CAD tools, tools more suited to two-dimensional displays. The main problem with CAD tools is not in getting the three-dimensional geometry out of the CAD files but rather the fact that data related to the actual physics of the three-dimensional objects modeled by the CAD systems are not usually present in such files. In addition, the partitioning information useful for real-time walkthrough of these data usually has to be added later by hand or back fed in by specially written programs. CAD files also have the problem that file formats are proprietary. An open VE CAD tool should be developed for use by the VE research community. This tool should incorporate many of the three-dimensional geometric capabilities in current CAD systems as well as physics and other VE-relevant parameters (i.e., three-dimensional spatial partitioning embedded into the output databases). It should also capture parameters relevant to haptic and auditory channels. Vision-Based Model Acquisition Although CAD systems are useful for generating three-dimensional models for new objects, using them can be tedious. Currently, modelers sit for hours detailing each door, window, and pipe of a three-dimensional building. VEs could be much more widely used if this painful step could be automated, perhaps via laser range finders and the right ''surface generation to CAD primitive" software. Unfortunately, there is the very hard multiple view, laser range image correlation problem. Automatic model acquisition would be a good first step toward providing the three-dimensional objects for virtual worlds. However, the physics of the objects scanned would still need to be added. This technology has many uses beyond developing VEs. An additional application area of high interest is providing CAD plans for older buildings, structures designed and constructed before the advent of CAD systems. Augmented Reality Real-time augmented reality is one of the tougher problems in VEs research. The two major issues are (1) accurate measurement of observer motions and (2) acquisition and maintenance of scene models. The prospects for automatic solutions to the scene model acquisition and maintenance were discussed above. The problems with measuring observer motion are more difficult and represent a major research area. Although VE displays provide direct motion measurements of observer movement, these are unlikely to be accurate enough to support high-quality augmented reality, in situations in which real and synthetic objects are in close proximity. Even very small errors could induce perceptible relative motions that could disrupt an illusion. Perhaps the most

OCR for page 247
Virtual Reality: Scientific and Technological Challenges promising course would be to use direct motion measurements for gross positioning and to use local image-based matching methods to lock real and synthetic elements together. TECHNICAL APPENDIX Graphics Architectures for VE Rendering The rendering operation has three stages: preprimitive, rasterization, and prefragment. Because of the performance demand, all modern high-performance graphics systems are run on parallel architectures. To allow the many-to-many mapping, the parallel rendering pipes must be combined at one point along their paths. The three possible locations for the crossbar are illustrated in Figures 8-6 through 8-8. The primitive crossbar (Figure 8-6) broadcasts window-coordinate primitives from the engines that transformed and lighted them to the one or more rasterization engines that correspond to frame buffer regions that each primitive intersects. Depending on the window-coordinate size of a primitive, it might be processed by just one rasterization engine, or by all of the rasterization engines. Thus this crossbar is really a one-to-many bus. The fragment crossbar (Figure 8-7) is a true, one-to-one crossbar connection. Each fragment that is generated by a rasterization engine is directed to the one fragment processor that manages the corresponding pixel in the frame buffer. Thus the fragment crossbar is itself more easily parallelized than the primitive crossbar, allowing for the necessarily greater bandwidth of rasterized fragments over window-coordinate primitives. The primary disadvantage of the fragment crossbar compared FIGURE 8-6 Primitive broadcast.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges FIGURE 8-7 Fragment crossbar. with the primitive crossbar is that fragment crossbar systems have difficulty rendering primitives in the order that they were presented to the graphics system, whereas primitive broadcast systems easily render primitives in the order presented. Whereas the frame buffers in the primitive broadcast and fragment crossbar systems were disjoint, collectively forming a single, screen-size buffer, the frame buffers of a pixel crossbar system (Figure 8-8) are each complete, screen-size buffers. The contents of these buffers are merged only after all of the primitives have been rendered into one of the buffers. The primary advantage of such a system over primitive and fragment crossbars is that pixel merge, using the z-buffer algorithm to choose the final pixel value, is infinitely extensible with no performance loss. Again, FIGURE 8-8 Pixel merge.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges the term crossbar is misleading, since the pixel merge can be accomplished with one-to-one paths between adjacent buffer pairs. The primary disadvantage of pixel merge systems is the requirement for large, duplicate frame buffers. A secondary disadvantage exists only with respect to primitive broadcast systems: the pixel crossbar, like the fragment crossbar, has difficulty rendering primitives in the order presented. (Each path renders the primitives presented to it in the order that they are presented, but the postrendering pixel merge cannot be done in order.) The primary disadvantage of frame buffer size can be mitigated by reducing the size of each frame buffer to a subregion of the final, display buffer. If this is done, the complete scene must be rendered with multiple rasterization passes, with the subbuffers being merged into the final display buffer (which is full size) after each pass is completed. Application of such a multipass technique introduces the second differentiator of parallel graphics systems: whether the rendering is flow-through or tiled. Flow-through systems complete the processing of each primitive soon after that primitive is presented to the rendering system, in which "soon" is a function of the number of processing steps. Tiled systems accumulate all the primitives of a scene after the per-primitive processing is complete, then begin the rasterization and per-fragment processing. They must do this because frame buffer tiles are allocated temporally rather than spatially, and so are not available in the random sequence that the primitives arrive in. The primary disadvantage of tiled systems over flow-through systems is therefore one of increased latency, due to the serialization of the processing steps. The third major differentiator is image quality: does the architecture support mapping images onto geometry (texture mapping), and is the sampling quality of both these images and the geometry itself of high quality (anti-aliasing)? This differentiator is less one of architecture than of implementation—primitive, fragment, and pixel crossbar systems, both flow-through and tiled, can be implemented with or without texture mapping and anti-aliasing. The final differentiator is performance: the number of primitives and fragments that can be processed per second. Again this differentiator is less one of architecture than of implementation, although at the limit the pixel merge architecture will exceed the capabilities of primitive broadcast and fragment crossbar architectures. Now we consider the architectures of four modern graphics systems, using the previously discussed differentiators. The Silicon Graphics RealityEngine is a flow-through architecture with a primitive crossbar. It therefore is able to efficiently render primitives in the order that they are presented and has low rendering latency. RealityEngine supports texture mapping and anti-aliasing of points, lines, and triangles and therefore is

OCR for page 247
Virtual Reality: Scientific and Technological Challenges considered to have high rendering image quality. RealityEngine processes up to 1 million texture mapped, anti-aliased triangles/s, and up to 250 million texture mapped, anti-aliased fragments/s. It is able to generate 1,280 × 1,024 scenes of high quality at up to 30 frames/s. Freedom series graphics from Evans & Sutherland use a flow-through architecture with a fragment crossbar. Thus Freedom machines also have low rendering latency, but are less able than the RealityEngine to efficiently render primitives in the order that they are presented. Freedom machines support texture mapping and can anti-alias points and lines, but they are unable to efficiently anti-alias surface primitives such as triangles. Hence the rendering quality of Freedom machines for full-frame solid images is relatively low. Although exact numbers for Freedom fragment generation/processing rates are not published, the literature suggests that this rate for texture-mapped fragments is in the tens of millions per second, rather than in the hundreds of millions. If that is the case, then the performance of Freedom graphics is not sufficient to generate 1,280 × 1,024 images at even 10 frames/s, the absolute minimum for interactive performance. Pixel Planes 5, the currently operational product of the University of North Carolina's research efforts, uses a tiled, primitive crossbar architecture. Because the architecture is tiled, the advantage of ordered rendering typical of primitive crossbar systems is lost. Also, the tiling contributes to a latency of up to 3 frames, which is substantially greater than the single-frame latencies of the Freedom and the RealityEngine systems. The rendering performance, especially the effective fragment generation/processing rate, is substantially greater than either the Freedom or RealityEngine systems, resulting in easily maintained 1,280 × 1,024 30 frame/s image generation. However, Pixel Planes 5 cannot anti-alias geometry at these high rates, so the image quality is lower than that of RealityEngine. Finally PixelFlow, the proposed successor to Pixel Planes 5, is a tiled, pixel merge machine. Thus it is unable to efficiently render primitives in the order in which they are received, and the rendering latency of PixelFlow is perhaps twice that of Freedom and RealityEngine, though less than that of Pixel Planes 5. PixelFlow is designed to support both texture mapping and anti-aliasing at interactive, though reduced rates, resulting in a machine that can produce high-quality, 1,280 × 1,024 frames at 30 or even 60 frames/s. Silicon Graphics from the IRIS-1400 to the RealityEngine 2 Silicon Graphics, Inc., a computer manufacturer, creates visualization systems with some of the more flexible and powerful digital media capabilities

OCR for page 247
Virtual Reality: Scientific and Technological Challenges in the computer industry, combining advanced three-dimensional graphics, digital multichannel audio, and video in a single package. Silicon Graphics systems serve as the core of many VE systems, performing simulation, visualization, and communication tasks. In such a role, it is critical that the systems support powerful computation, stereoscopic, multichannel video output, and fast input/output (I/O) for connectivity to sensors, control devices, and networks (for multiparticipant VEs). Textured polygon fill capability is also one of the company's strengths with respect to virtual worlds in that texturing enhances realism. In support of this role, Silicon Graphics has engaged in the development of multiple processing, graphics workstations at the leading edge of technology since late 1983. A brief look at the graphics performance numbers of their high-end systems since that time is warranted (Table 8-1). Those systems comprise three generations, as described in the RealityEngine Graphics paper (Akeley, 1993). The 1000, 2000, and G are first generation, the GTX, VGX, and VGXT are second generation, and the RealityEngine and RealityEngine2 are third generation. Performance is listed for first-, second-, and third-generation operations for all these machines. Notice that the curve for first-generation performance falls off with second- and third-generation machines, because they are not optimized for first-generation rendering. Onyx RealityEngine2 In January 1993, Silicon Graphics announced the Onyx line of graphics supercomputers, which incorporate a new multiprocessing architecture, PowerPath2, to combine up to 24 parallel processors based on the MIPS R4400 RISC CPU, which operates at 150 MHz. I/O bandwidth is rated at 1.2 Gbytes/s to and from memory, with support for the VME64 64-bit bus, operating at 50 Mbytes/s. Onyx systems can utilize up to three separate graphics pipelines based on the new RealityEngine2 graphics subsystem. This new graphics system offers 50 percent higher polygon performance than the original RealityEngine introduced in July 1992. RealityEngine2 is rated at 2 million flat triangles/second and 900,000 textured, Gouraud shaded, anti-aliased, fogged, z-buffered triangles/s. The optional MultiChannel board enables users to take the frame buffer and send different regions out to different display devices. Thus, a single 1.3 million pixel frame buffer could be used either as a 1,280 × 1,024 display or as four 640 × 512 displays. The MultiChannel option provides up to six separate outputs.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges TABLE 8-1 Performance History for SGI Graphics   Depth buffered, lighted, Gouraud shaded   Depth buffered, lighted, Gouraud shaded, anti-aliased, texture mapped   System Date Points Tris Pixels Tris Pixels Tris Pixels 1000 1983 0.06 0.001 40 n/a n/a n/a n/a 2000 1984 0.07 0.01 46 0.0008 0.1 n/a n/a G 1986 0.14 0.01 130 0.003 2.0 n/a n/a GTX 1988 0.45 0.135 80 0.135 40 n/a n/a VGX 1990 1.5 1.1 200 0.8 100 (0.08) (10) VGXT 1991 1.5 1.1 200 0.8 100 (0.08) (50) RE 1992 2.0 1.4 380 1.4 380 0.6 250 RE2 1993 3.0 2.0 380 2.0 380 1.0 250 NOTE: The first three columns are for generic rendering, no depth buffer, no lighting or shading, and no texture mapping or anti-aliasing. All values are in millions per second (points, triangles, or rendered pixels). Values in parenthesis are texture mapped but not anti-aliased. All these systems were introduced in the $50,000 to $100,000 range. Tris = triangle meshes; Pixels = pixel fill rate.

OCR for page 247
Virtual Reality: Scientific and Technological Challenges Evans & Sutherland Freedom 3000 Evans & Sutherland (E&S), and old line flight simulator company, has recently announced the Freedom Series of graphics accelerators targeted for the Sun Microsystems Sparc 10 line of workstations. The Freedom series offers a wide range of performance levels: from 500,000 polygons per second for the Freedom 1000 to 3 million polygons per second for the Freedom 3000. The Freedom series uses standard hardware and software interfaces to join seamlessly with the Sun Microsystems environment. The Freedom accelerators are programmable with Sun's standard interfaces and are software-compatible with workstations currently available from E&S and Sun. The Freedom 3000 has 1,280 × 1,024, 1,536 × 1,280, and high-definition TV display formats. It also supports hardware texture mapping, including MIP-mapping, and resolutions up to 2,000 × 2,000. Additional features supported are: anti-aliased lines, dots, and polygons, alpha buffering, accumulation buffering, 128 bits per pixel, and dynamic pixel allocation. The Freedom 3000 contains the following technology: five proprietary VLSI ASIC chips types using 0.8 µ CMOS, a parallel array of programmable high-speed microprocessors (DSPs), a very fast, proprietary graphics bus (G-bus) capable of speeds well beyond 3 million polygons/s, high-speed pixel routing interconnection, high-speed access to frame buffer for image processing (up to 100 million pixels/s), and a pixel fill rate of 95 million pixels/s. Graphics Hardware from the University of North Carolina, Chapel Hill: PixelPlanes 4, 5, and PixelFlow The University of North Carolina at Chapel Hill is one of the last schools still developing graphics hardware. Their efforts differ widely from what has been attempted in the commercial world, since their work is more basic research than machine production. Despite this research focus, the machines developed by Fuchs, Poulton, Eyles, and their colleagues have been close to the leading edge of graphics hardware at each of their prototypical stages (Fuchs et al., 1985, 1989). Pixel Planes 4 had a 27,000 polygons/s capability in 1988, with a follow-on machine Pixel Planes 5 shown first in 1991 with a 1 million polygons/s capability. The latest machine, PixelFlow, is still under development but shows great promise (Molnar et al., 1992). It is expected to be working in 1994. PixelFlow and its graphics performance scalability are an important part of the future of high-performance three-dimensional VEs. PixelFlow, an architecture for high-speed image generation, overcomes the transformation

OCR for page 247
Virtual Reality: Scientific and Technological Challenges and frame-buffer access bottlenecks of conventional hardware rendering architectures (Molnar et al., 1992). It uses the technique of image composition, through which it distributes the rendering task over an array of identical renderers, each of which computes a full-screen image of a fraction of the primitives. A high-performance image-composition network combines these images in real time to produce an image of the entire scene. Image composition architectures offer performance that scales linearly with the number of renderers. A single PixelFlow renderer rasterizes up to 1.4 million triangles/s, and an n-renderer system can rasterize at up to n times this basic rate. It is expected that a 128 renderer PixelFlow system will be capable of a polygon rate approaching 100 million triangles/s. PixelFlow performs anti-aliasing by supersampling. It supports deferred shading with separate hardware shaders that operate on composite images containing intermediate pixel data. PixelFlow shaders compute complex shading algorithms and procedural and image-based textures in real time, with the shading rate independent of image complexity. A PixelFlow system can be coupled to a parallel supercomputer to serve as an intermediate-mode graphics server, or it can maintain a display list for retained-mode rendering.