National Academies Press: OpenBook
« Previous: 2 USGS Needs for GIScience Capabilities
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 43
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 44
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 45
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 46
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 47
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 48
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 49
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 50
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 51
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 52
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 53
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 54
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 55
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 56
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 57
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 58
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 59
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 60
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 61
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 62
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 63
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 64
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 65
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 66
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 67
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 68
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 69
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 70
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 71
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 72
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 73
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 74
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 75
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 76
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 77
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 78
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 79
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 80
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 81
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 82
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 83
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 84
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 85
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 86
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 87
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 88
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 89
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 90
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 91
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 92
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 93
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 94
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 95
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 96
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 97
Suggested Citation:"3 Research Priorities ." National Research Council. 2007. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press. doi: 10.17226/12004.
×
Page 98

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 Research Priorities This chapter addresses the committee’s third task—to make recommendations regarding the most effective research areas for the Center of Excellence for Geospa- tial Information Sciences (CEGIS) to pursue. The need for prioritization is the clear driver for this study—for, as noted earlier, there are many more research challenges than even the most optimistic assessment of CEGIS’s future resources can support. The committee has already established the need for, and recommended an initial focus on, research to support The National Map (Chapter 2). This chapter describes and recommends research priorities under that overarching theme. Although other research topics such as visualization, cognition, and land use or land cover change are very important, the committee feels that enhancing The National Map will opti- mize initial efforts while leaving open the possibility of expanding to other topics mentioned by McMahon et al. (2005) in due course as resources allow. The chapter has two parts. The first part describes the committee’s approach to determining priorities and applies the resulting prioritization criteria to yield an ini- tial set of priority research areas for CEGIS. The second part delves more deeply into priority research topics that fit within each of the three general research areas and demonstrates how these priorities are interrelated within The National Map. In the long run, this set of priorities will have to adapt to changing U.S. Geological Survey (USGS) needs and resources. 43

44 A Research Agenda for GIScience at the USGS PRIORITY RESEARCH AREAS This section defines and applies criteria for the prioritization of CEGIS re- search. The committee deliberated on candidate criteria based on information from meeting participants, interviews, and other inputs (Appendix B). Not only do the criteria help define broad research areas, they point to more specific priorities among focused topics within these areas. Consequently, the criteria are used again later in this chapter. The committee’s eight prioritization criteria for CEGIS research fol- low: Prioritization Criteria for CEGIS Research 1. Importance to The National Map. The National Map is a critical product and service of the USGS and, in particular, of the National Geospatial Pro- gram Office (NGPO). Consequently, an initial research emphasis on serving the needs of The National Map is a high priority. Furthermore, if applied to enhancing The National Map, the results will be a visible and high-profile measure of the success of such research. 2. Importance to USGS disciplines. After serving the needs defined by The National Map, the most important constituencies for CEGIS are the USGS disciplines. discipline needs and The National Map needs are not mutually exclusive. New capabilities for The National Map described in Chapter 2 are envisioned to serve the disciplines and multidisciplinary interactions. 3. Relevance to society. CEGIS serves not only USGS but also the nation. Its research projects will have to demonstrate high relevance to society. 4. Solves a problem and targets a customer. At this early stage in CEGIS’s evolution and with limited resources, CEGIS will have to focus on applied research with measurable payoff. Solving key customers’ problems should receive high priority. 5. Foundational, understandable, and generalizable. CEGIS’s most impor- tant projects will be those that solve problems in geographic information science (GIScience) that have general applicability to the field and are eas- ily comprehensible by users and customers. A measure of success in this criterion would be acceptance of CEGIS research results in a peer- reviewed publication. 6. Enables multidisciplinary integration. Due to the wide variety of users of CEGIS’s research, the most effective research will be that which serves the widest breadth of users and supports an “enterprise solution.” 7. Focus on content. Content is the defining ingredient provided by the USGS—whether from The National Map or elsewhere. CEGIS’s research will need to focus on content-related issues. CEGIS may at times do

Research Priorities 45 conceptual design of tools, but tool development is considered part of development engineering. 8. Potential for early, visible success. As with any organization, CEGIS has strongest prospects for longevity and value to USGS if it achieves and builds on early successes. CEGIS will need to target programs with this in mind. It is important to note that these criteria are intended only as a starting point for CEGIS. From here it is essential that CEGIS continue to review this prioritization as well as take it to the next level of detail to resolve further trade-offs on what to do first within the available resource pool. These criteria for prioritizing CEGIS re- search point toward a program of research areas with underlying focused topics that supports users of The National Map data content and produces visible results in a short period of time. Research Areas Three broad research areas emerged from the committee’s deliberations on the eight prioritization criteria: 1. Investigating New Methods for Information Access and Dissemination. Access to information content is a key success factor at many levels for The Na- tional Map. The USGS disciplines need effective data access to carry out their missions. Other federal and state agencies need effective interfaces to The National Map content so that their organizations can maximize productivity when working with national and local data. This priority also supports society in general because citizens need a trusted, up-to-date source of geospatial data for the nation that is flexible and easy to use. In addition, this is an area with potential for visible early success enabling interim milestones in CEGIS’s longer-term research agenda. 2. Supporting Integration of Data from Multiple Sources. Given the diversity of source data from state and local agencies as well as many add-on themes and the desire for multidisciplinary research across USGS, achieving efficient and accurate data integration is fundamental to the effectiveness of The National Map. Within USGS, researchers in the various disciplines will need to find common reference data in The National Map and be able to load and share their data. Furthermore, the types of models and forms of spatial analysis that are increasingly needed to solve social and environmental problems will require that spatial data sets can be inte- grated on the fly. CEGIS will need to find solutions to integrating data with different semantics and widely varying quality, scale, and spatial and temporal granularities and resolution. 3. Developing Data Models and Knowledge Organization Systems. To sup- port society in general, The National Map will need both the semantic flexibility of a

46 A Research Agenda for GIScience at the USGS well-designed framework and models that enable a variety of user requests for in- formation and information products. This objective will likely require the most research effort, but it will deliver enormous power to The National Map applications and lead to its clear differentiation from other web-based products. Because all three research areas are core geographic information science re- search areas that are of general interest to the broad GIScience community in addition to USGS (see, e.g., DiBiase et al., 2006), CEGIS will be able to leverage ongoing research activities in this broader community. The arguments presented above lead to the following recommendation: RECOMMENDATION 2: The three priority research areas for CEGIS should be (1) information access and dissemination, (2) integration of data from multiple sources, and (3) data models and knowledge organization systems. PRIORITY RESEARCH TOPICS Authorities were notified early yesterday of a fire raging in the hills around San Diego. The local fire district office immediately accessed The National Map and dis- played a topographic map of the area, including known fire trails in the hills and water resources. Given the terrain, fuel supply and impending weather, the team realized that it had a very difficult challenge on its hands and team members would be depend- ing on technology, as well as the hard work of their crews, to deal with the crisis. To bring the discussion of a GIScience research agenda to life, we have woven into the remainder of this chapter firefighting and management examples in the form of a scenario of the use of The National Map to manage and fight a wildfire in San Diego, California. Geospatial information and tools are useful in wildfire risk as- sessment, modeling, monitoring, and firefighting, emissions modeling, and burn scar mapping (Rothermel, 1972; Radke, 1995; Clinton et al., 2006; Gong et al., 2006). Firefighting can benefit from accurate static geospatial data (e.g., topography) as well as dynamic information (e.g., fuel and weather) viewed in a spatial context. Improved data access, data integration, and data modeling and knowledge organiza- tion are all key to an enhanced National Map that can more effectively serve fire management applications as well as many others. (Note that these scenarios are intended for illustration purposes only and are not intended to reflect actual current or planned capabilities). Each of the recommended broad research areas from the previous section en- compass a range of focused research topics. These also need to be prioritized for CEGIS’s research portfolio. The following three subsections describe in detail these research topics and, drawing again on the prioritization criteria listed earlier, recom- mend the two highest-priority topics under each research area. The order of these

Research Priorities 47 three subsections is driven by which research areas will likely result in early “wins” for CEGIS. Consequently, the subsections progress from near-term toward the longer-term and more challenging research. All of the research topics identified could span a broad range from basic to applied research. To provide context for the state of the art, the discussion generally begins with a description of the basic nature of the topic and lists references to relevant research. However, the recommended research questions are focused on applied research since they are motivated by the goals of The National Map and therefore are aimed specifically at how this research will advance the capabilities of The National Map. Of course, the application of this applied research does not stop with The National Map and will serve the other USGS disciplines as well as other agencies and users in the field. In this way, the leadership of the USGS and NGPO is demonstrated not only by the creation of a powerful National Map, but also by the far-reaching influence and value of the ap- plied research the agency conducts. Each subsection provides a general explanation of the problem; describes the re- lationship of the focused research topics to the USGS context (its relevance to The National Map, NGPO, and/or any of the USGS disciplines); and describes the ma- turity of the problem, approximate time frame to complete the research (near term or longer term) and in which organizations the research center of gravity resides. Al- though the committee did not evaluate the potential duration of research projects in great detail, in general short term is considered to be one to four years, and long term four to eight years. Three presentation tools are utilized in this section to help clarify the main points and tie the material together. First, the specific research questions offered under each topic as starting points for CEGIS research are collected in a summary table in the final section of the chapter. Second, the aforementioned scenarios of wildfire management and operations are revisited in each subsection to illustrate how the proposed research relates to an operational application. Third, the commit- tee uses Figure 3.1 to illustrate how the research areas and topics are linked in the context of The National Map. Figure 3.1 illustrates the relationship of the recommended research topics to the overall framework of The National Map introduced in Figure 2.1, addressing most of its components. Colored boxes are research topics that would add a new capability or feature to The National Map and the three colors relate to the three research areas dis- cussed in this chapter. The pink boxes and arrows indicate research topics covered in the section on Information Access and Dissemination. The blue boxes and arrows indicate research topics covered in the section on Integration of Data from Multiple Sources. Research topics in the yellow boxes and arrows are covered in the section on Data Models and Knowledge Organization Systems. The committee’s six recom- mended priority research topics for CEGIS are bolded in Figure 3.1. Box 3.1 describes how research in these areas would enhance the capabilities and functionality of The National Map.

48 A Research Agenda for GIScience at the USGS FIGURE 3.1 A potential framework for The National Map of the future and areas of GIScience research for CEGIS that will fuel its evolution. Recommended priority research topics are in bold within the colored boxes and arrows. This framework is adapted from that in Figure 2.1 and emphasizes The National Map aspects of the diagram—not those relating to the National Atlas. NOTE: API = application programming interfaces. CSW = Catalog Service for Web; EPA = Environmental Protection Agency; NASA = National Aeronautics and Space Administration; NOAA = National Oceanic and Atmospheric Administration; OGC = Open Geospatial Consortium; WCS = Web Coverage Services; WFS = Web Feature Services; WMS = Web Map Services.

Research Priorities 49 Box 3.1 Benefits of the Research Topics to The National Map Information Access and Dissemination Reinvented Topographic Maps • Provide easy public access to a valuable USGS product User-Centered Design • Improves usability of the human interface • Easy access to high-quality maps in various media • High-quality printing for all users OGC Standard Profiles • Facilitate a systematic framework for a distributed National Map computing system Integration of Data from Multiple Sources Data Fusion • Integration of dissimilar data types enriches The National Map database • Facilitates integration of local data with various scales, types, etc. Generalization • Allows automatic scaling of output to user’s needs Data Models and Knowledge Organization Systems Geographic Feature Ontologies • Specify feature semantics for richer data models Ontology Driven Data Models and Gazetteers • Organize data to support queries by place name, feature types, feature parts and multple representations Quality-Aware Data Models • Add ability to automatically assess quality of diverse input data Data Models for Time and Change • Analyze and track land feature changes Transaction Processing • Supports frequent data updates from distributed sources The current version of The National Map has created an excellent field test of a prototype or beta version from which to build. The current National Map implementation reveals its strengths as well as its limitations. In fact, it is probably true that the only way to understand the highly complex information system design needs of The National Map is to field a prototype and measure its good and bad points. To break through the technology barriers that stand be- tween the current National Map and the way it is envisioned in Figure 3.1, a thorough review of the system design is warranted. The committee has sug- gested one possible scenario (Box 2.2) based on current capabilities and trends, but in the long term The National Map system design team within USGS would feed requirements-based research challenges to CEGIS that would undoubtedly result in adjustments to the set of priorities listed in this chapter. The National Map of the future is envisioned to be a highly dynamic and flexible transactional information system. Those transactions occur on both the input and output sides of the system, with the powerful concept of seamlessly integrating local feature level granularity data into the database in real time. A

50 A Research Agenda for GIScience at the USGS wide diversity of users access data, use tools to geospatially and temporally ana- lyze data, and construct products and tools on top of The National Map in sessions with application programming interfaces on the output side. The mag- nitude and breadth of these transactions define the potential value of The National Map, but also create an information system design challenge. In addition to the influence of evolving capabilities in GIScience research and technology on the potential framework of The National Map, broader trends on the web1 will inevitably affect its architecture because the web is the delivery platform for The National Map. These trends will, for example, push an enhanced National Map toward a service-oriented rather than system-oriented approach, a collective intelligence rather than a single knowledge base, data as the driving force, lightweight user interfaces and development models for fast and reliable system performance, mapping software that supports multiple devices (e.g., personal digital assistant [PDA], cellular phone), and direct feedback opportunities that support rich user experiences and user participation. Information Access and Dissemination Wildfires are spreading rapidly across a San Diego mountainside. Firefighters have deployed with two-way radios and Global Positioning Systems (GPS). In the command center, the new three-dimensional topog- raphic maps overlaid with near-real-time airborne color-infrared thermal imagery, real-time GPS wireless sensor data, and National Weather Ser- vice maps of wind direction, precipitation potential, and temperature displayed on the computers allow the command center team to tell the fire- fighters where the wildfire boundaries are and help them estimate the likely fire spread directions and speed in the next two hours The operators at the command center find it intuitive to toggle between the various layers of data to analyze the situation and can select different combinations to produce PDF files for fast printing to distribute to the crews. Meanwhile, the GPS and wireless communication enable the transmission of the posi- tion of the crew back to the command center, which has a large screen to display the overview maps with current positions of all firefighters and current fire perimeters. With comprehensive geographic information sys- tem (GIS) modeling technology and the information provided from The National Map (topography, slope, aspect, weather, soil moisture, vegeta- tion, etc.), the command and control center calculates potential dangers for firefighters and immediately distributes a warning to the crews on the west side of the mountain to relocate 300 m farther west. Based on infor- mation from the overview maps, the center also dispatches another crew to 1 See http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html.

Research Priorities 51 the highest-risk zone and moves two more toward that zone. Their earlier participation in design phases is paying off in powerful but easy to use geospatial tools in a frantic and hostile environment. A well-designed and user-friendly web mapping service is essential for effective use of USGS data and map products. The design of web-based USGS mapping applications is a great challenge because users can change the contents immediately by manipulating map browsers in such simple functions as zooming in, zooming out, or changing layers. The communi- cation mechanism between map production and map users has never been as immediate and important as it is now in the rapidly expanding web mapping environment. The three GIScience research topics described in this section contribute to improved web services and map display—with the ultimate goal of improving accessibility and usability of USGS prod- ucts. These topics, listed in the committee’s recommended priority order based on the criteria presented earlier and beginning with the highest prior- ity, are: 1. Innovative formats and designs to reinvent topographic maps in an electronic environment; 2. User-centered design (UCD) for implementation of The National Map web services; and 3. Open Geospatial Consortium (OGC) Standard Profiles for The Na- tional Map web mapping services and map layer design. This subsection covers each of these foci in the order presented above. The first two topics need immediate action by CEGIS because of their fun- damental value to users of The National Map and the potential for near- term, visible success. RECOMMENDATION 3: The two priority research topics within the area of information access and dissemination should be to reinvent topographic maps in an electronic environment and to investigate user-centered design for The National Map web services. Priority CEGIS Research Topic: Innovative Formats and Designs to Rein- vent Topographic Maps in an Electronic Environment Topographic maps are the one of the most important products of the USGS and The National Map. They were established in the nineteenth century (Thompson, 1988) and are the USGS’s most recognized and popular map product. In the digital mapping age, CEGIS has the opportunity to conduct research that will transform the well-designed traditional paper topographic

52 A Research Agenda for GIScience at the USGS maps into an electronic, web-based, multipurpose utility. Effective delivery of topographic maps will serve both society and professionals who use this information as a base layer for analyses. This work requires immediate attention by CEGIS and can be accomplished in the short term (one to four years)— drawing in particular on the expertise of USGS’s many well-trained cartographers in collaboration with software vendors with established technologies for map display. As illustrated in the vignette at the start of this section, well-designed three-dimensional electronic topographic maps will become a critical source of information in such applications as wildfire spread predictions and emergency response. Two research foci are of particular and immediate value to the cartographic display of The National Map: (1) development of PDF topographic maps for wide distribution and (2) development of foreground and background data layers for control of visual hierarchies in each of the eight data layers for which USGS has responsibility in The National Map. These two foci arise because the avail- able methods for creating online topographic maps using The National Map viewers are fairly complicated for public use—users must often select layers and symbols from among hundreds of choices; alternatively, they are confronted with a map made with all themes as strong high-contrast symbols—sometimes with confusing color choices (Figure 3.2). PDF Topographic Maps. In the simplest case, CEGIS could develop PDF to- pographic maps with an associated specialized map viewer. PDF is preferred because it retains the resolution needed in print products, has wide distribution, and would accommodate viewing, saving, and printing maps by users with the most minimal computing capabilities. All topographic map symbols and layer contents are predefined by USGS cartographers. Topographic map symbol col- ors, widths, textures, shapes, and sizes could mimic the existing map style if scale is restricted to 1:24,000, for example. CEGIS research needs to address design changes that accommodate changes in scale and resolution. Existing point, line, and area elements can be used at a range of scales and resolutions with minor adjustment to symbols and selection of features (Brewer and Buttenfield, 2007). Changes in symbol size and shape, line width, use of outlines, color, transparency, and texture all extend the readability of map data without requiring geometric changes through gener- alization.

Research Priorities 53 FIGURE 3.2 Multicolor forest fragmentation theme (upper pane) that interferes with the base information overlaid on it (shown separately in the lower pane). Red, blue, and black symbols in the forest theme are the same color as the base elements, making them ineffective location cues despite simple and consistent display choices. SOURCE: USGS (http://nationalatlas.gov). Research Question: What is the widest range of scales that can be mapped only by adjusting map symbols combined with selectively removing feature types? (short term) Research Question: What is the minimum amount of change to map sym- bols and content that provides the maximum scale range maintaining topographic map usability? (short term) Advanced use of the PDF to deliver topographic maps could make use of Optional Content Groups (OCG) (Adobe Systems Incorporated, 2004). OCG

54 A Research Agenda for GIScience at the USGS allow groups of graphics to be set to “visible” or “invisible” by viewers. These are dynamic changes that may be used to mimic GIS map layer visibility set- tings or symbol redesign to suit smaller scales and coarse-resolution viewing. For example, a subset of layers may be set to invisible when zoomed out, or larger labels may be shown for readability at smaller scales within one PDF file delivered to the user. USGS cartographers have accumulated substantial carto- graphic knowledge on the design of topographic maps from their past work with paper topographic maps. This knowledge can be adapted and revised for the design of PDF topographic maps. The reinvented USGS electronic topographic map will also need to focus on integrating GPS and accommodating various display devices. For example, the new topographic maps could include a version designed for display on portable computers and personal navigation devices (with screen resolutions of 300 x240) for general hiking purposes. The same electronic map might be converted to a design suited to a very large high-definition screen (3000 x2000) for confer- ence and meeting presentation purposes. Research Question: What is the stability of topographic map design (with the goal of establishing a coherent set of designs that function from coarse to fine resolutions through scale change)? (short term) CEGIS’s goal could be to automate these adjustments so that map users are not faced with many symbol options for every device and scale at which they request topographic maps. CEGIS does not have any research in this area at this time, but it could call on USGS’s long experience in paper map design as a start- ing point. Visual Hierarchy. Even without enumerating all possible users and uses for spatial data, CEGIS can help reinvent the topographic map by investigating how to offer the base layers for which USGS has responsibility (elevation, hydrogra- phy, transportation, boundaries, orthoimagery, land cover, structures, and names) as foreground and as background information to allow flexible, user- determined combinations. Point, line, and area elements within each theme that form a map can be the emphasis of mapmaking (foreground) or they can be background information on which other information is overlaid. Either map elements can automatically adjust to being foreground or background or the user can be given the capability to choose between the two options. Without the abil- ity to change the prominence of elements, map users will not be able to make readable or presentable maps without having GIS or graphics software and un- derstanding how to download, open, and re-symbolize features. This requirement leaves behind a majority of users. The combination of forest fragmentation information with a road and stream base map shown in Figure 3.2 demonstrates this need for controlling

Research Priorities 55 symbol design combinations. The National Atlas viewer offers the base informa- tion in a seemingly obvious set of symbols (Figure 3.2, lower pane). One colorful overlay belies the utility of these choices. The red for roads and blue for hydrography are used in the forest symbols (represented with a rainbow of col- ors common in mapping scientific data). Any color and line weight choice limits the options for representing other data, so basic style sets that can be selected to produce a usable appearance are needed. This is not a matter of refining the beauty of the display; it is a distinction between offering information or an un- usable mass of data. One example of a research application in this area among the 2007 CEGIS-funded projects is provision of Internet flood mapping and flood warning layers. These detailed hydrographic data would be foreground with locational information provided as background information (base data). Developing a visual hierarchy avoids the complexity of offering a design suited to each application, such as hiking, bus routing, or voter districting. There are, in fact, more than 100 GIS uses at a county level (Halsing et al., 2004). Customized template designs for each use would produce a cumbersome web interface. In addition, emphasis on foreground designs for each base layer offers a structured approach to custom design. Research Question: What should be the visual hierarchies for the base National Map layers? (short term) Visual hierarchy conventions are already being developed for road maps with the private sector leading the way on implementation. Contemporary ex- amples of symbolization schemes for roads include those from Google Maps (established earlier in print in the National Geographic Road Atlas [previously GeoSystems Road Atlas] and European topographic mapping). These designs use color and line width redundantly and a hierarchy of oranges and yellows for wider main roads and gray or white narrow lines for local roads (and offer a sound implementation of visual variable principles long established in carto- graphic design) (Bertin, 1983; MacEachren, 1995; Brewer, 2005). Orange- yellow-white is a series of adjacent hues that also change in lightness and satura- tion, making simultaneous use of these three color dimensions to build maximum contrast between symbols for enhanced readability. This combination, shown in Figure 3.3, is becoming a widely understood and clearly organized symbol set. In the Google Maps road symbol set, hydrography in blue and parks in green are examples of background information presented in desaturated low- contrast colors that push them into the background of the map. Brown urban areas and grayish-blue water in Figure 3.3 are similarly background information.

56 A Research Agenda for GIScience at the USGS FIGURE 3.3 Example map display with detailed hierarchy of roads shown in different widths and a range of colors from orange through yellow to white. SOURCE: Original graphic made from National Atlas data sources. Because so much work has already been done on visual hierarchies for roads, the transportation layer of The National Map is a likely candidate for early development of visual hierarchies to be used in The National Map viewer and products. The other major data themes will have to be reviewed to deter- mine which can also build on existing work. No automated procedures exist for designating the most relevant themes for a map purpose and allocating them to foreground and background through sym- bol design. Expert users could make these distinctions by manually setting priorities for each layer. They could set an order of importance for themes or a legend order, or choose among starting templates that include some initial priori- ties and then add additional themes to the template. Alternatively, for the non- specialist, the web viewer could analyze the requested combination of informa- tion and select visual priorities based on a likely mixture so that, for example, a wetlands theme would prompt detail in hydrography and background representa- tion of boundaries while population themes would prompt hydrography to be displayed with background symbolization. User profiles could also be included in procedures for setting symbolization hierarchies. Research Question: How should USGS select a subset of automated and manual approaches to visual hierarchies to provide a tool that effectively serves the largest number and variety of National Map users seeking to an- swer geographical questions that are not served by commercial point-to- point navigation tools (e.g., Google Maps, MapQuest, and Yahoo!)? (short term)

Research Priorities 57 USGS’s mapping challenge goes beyond route navigation and point loca- tion because it is responsible for multiple themes. The USGS viewer or server needs to flexibly alter symbols to bring them into the foreground or background based on map purpose and visual combination with other data that have been requested. Implementing this flexibility will accommodate the majority of user needs and will aid more adept users in getting to a useful starting symbol set when these prioritized styles are exported with downloads to be used further with GIS software. Research Question: What is the optimal combination of types and number of symbols for an inexperienced user to create an effective topographic map and accommodate a data overlay on a topic of interest using web tools? (short term) This knowledge will guide further refinement of topographic mapping web resources for the public. Priority CEGIS Research Topic: User-Centered Design for Implementation of The National Map Web Services User-centered design of The National Map web services and viewers will improve usability by accommodating different needs for display and functionality. UCD is an interactive process of system development with user participation and evaluation (Box 3.2) and is a major research area in computer science and human-computer interaction (Nielsen, 1993, 1999; Shneiderman, 1998; Garrett, 2002). Web services are an advanced technol- ogy framework for web applications. The framework can provide high-level integration of multiple data process functions and information services hosted on different machines (web sites) (Tu and Abdelguerfi, 2006). Web services are very important for the future development of The National Map because they will extend The National Map from generic mapping functions to advanced geospatial analysis and modeling tasks. The adoption of UCD and web services approaches for The National Map will facilitate the inte- gration of distributed geospatial information in USGS and improve the usability of The National Map. The UCD approaches developed in computer science are ready to be adapted for The National Map. In the vignette at the start of this section, UCD could provide a better mapping tool to help firefighters obtain easy access to The National Map. To fulfill different user needs among the USGS disciplines for display- ing National Map layers, CEGIS research in UCD could address the design of multiple cartographic presentation methods. In fact, two of the 2007

58 A Research Agenda for GIScience at the USGS CEGIS projects already relate to this topic. One project will create maps of ecohydrologic properties that can be used by cooperating projects and agen- cies. The other proposes data layers featuring 100 invasive plant species that are continuously updated with new field observations for use by the public and scientists. CEGIS guidance on how to format and represent new data layers to work with USGS base information (collected in The National Map) advances the scientific mission of all USGS disciplines. Web services are interoperable and self-describing applications that can communicate with each other over the web services platform. There are many online GIS applications utilizing web services for web mapping or geocoding functions (Peng and Tsou, 2003). For example, popular Google Map application programming interfaces (APIs) and U.S. Census Bureau’s geocoding services (converting U.S. streets addresses into x,y-coordinates) are lightweight web service examples for GIS applications. An important concept in the development of web services is Service Ori- ented Architecture (SOA). SOA can allow multiple applications running on heterogeneous platforms to connect to each other and create a chain of web ser- vices for different users and applications. Interoperability and openness are the two key advantages for the development of web services. The openness of web service specifications encourages software developers to create flexible and cus- tomizable web applications based on web service standards. Interoperable web services can allow end users or service consumers to combine multiple functions and operations into a single web document for their own needs. The OGC envisions that web services will allow future web applications to be assembled from multiple geoprocessing and location services (OGC, 2004). The adoption of web services in The National Map can integrate various GIS functions, maps, and data servers into a systematic web service framework rather than create scattered Internet GIS applications. There are three major UCD-related tasks for the implementation of National Map web services: (1) user interface design for map viewers, (2) functional analysis for map servers, and (3) user testing and evaluation methods for Na- tional Map products. These short-term tasks can be accomplished by CEGIS within one to four years because the computer science community has devel- oped comprehensive and effective UCD approaches that are ready to be used for the implementation and evaluation of National Map web services. Nonetheless, adopting UCD for The National Map and other USGS products will be challeng- ing because it is difficult to classify multiple user groups, and what types of map content, symbols, user interfaces, and system functions are appropriate for dif- ferent user groups.

Research Priorities 59 BOX 3.2 International Organization for Standardization (ISO) Standard 13407: Human-Centered Design for Interactive Systems The ISO 13407 standard includes five steps for the implementation of UCD applications: 1. Prepare and plan for human-centered work processes, 2. Understand and specify the context of use, 3. Specify the user and organizational requirements, 4. Produce design solutions, and 5. Evaluate designs against requirements. One important aspect of the ISO 13407 standard is iterative looping through these steps (see figure below). Results from the fifth step can be applied to the product from the second step in an iterative loop of user feedback and revision (ISO, 1999) that continues until the objectives and user needs are satisfied. The ISO 13407 human-centered design processes for interactive systems. SOURCE: Modified from the ISO 13407 Model Overview, http://www.iso.org.

60 A Research Agenda for GIScience at the USGS User Interface Design for National Map Viewers. User Interface Design (UID) develops computer software or hardware that can be used to access and operate information stored in a computer or other storage device. In GIS, design issues often focus on the graphical user interfaces offering icons, menus, and windows with which users manipulate and display geospatial information. CEGIS’s UID focus for The National Map viewer will allow customization for domain-oriented applications in USGS disciplines. For example, a currently funded CEGIS research project seeks to build a “user-friendly GIS tool” for local uncertainty analysis in watershed management decisions. Confirmation of a user-friendly result requires user testing. Another current CEGIS project seeks to combine GIS, an invasive species database, and statistical capabilities for decision support using web tools, and this goal will require a user interface evaluation. A third project allows users to generate flood warnings for specific locations for end users, again requiring examination of the user interface.2 De- veloping and propagating user interface evaluation techniques to the USGS disciplines developing these tools will ensure the researchers effectively meet their stated goals. The communication methods of The National Map viewers are also a very important issue for user interface design. Currently, Simple Object Access Pro- tocol (SOAP), Representational State Transfer (REST), and JavaScripts are popular methods for creating web-based mapping applications and viewers. CEGIS needs to analyze the advantages and disadvantages of various technol- ogy frameworks and methods for creating The National Map viewers and the user interfaces. The user interface designs of Google Earth and MapQuest provide good ex- amples for CEGIS on how to improve The National Map viewer. However, this viewer will also have unique characteristics and will need to establish and pro- mote the USGS topographic map brand and provide access to the full depth of USGS geospatial data. This viewer will need to move beyond the point and route functionality of popular map tools to smart combinations of features with names and attributes, including networks, areal data, model output, and spatial analysis results suited to environmental and social decision making. Research Question: With the goal of updating and evaluating The National Map viewer user interface, (a) what types of user interfaces are appropriate for The National Map viewers, (b) does The National Map need different viewers for different users and map contents or is a single one appropriate, and (c) what kinds of communication methods are effective for disseminat- ing geospatial information through web browsers? (short term) 2 See http://cegis.usgs.gov/proposals.html.

Research Priorities 61 Once these basic questions are answered, potential longer-term research topics for CEGIS in UID include investigating novel interface approaches, such as voice commands with natural language input, touch-table navigation tools, and augmented reality displays. Topics to be addressed will have to be based on what users demand—fed to CEGIS through The National Map system design team, contact with users in the USGS disciplines, and other forums (discussed further in Chapter 4). Functional Analysis for The National Map Server. Commercial geospatial data servers (such as the Environmental Systems Research Institute’s [ESRI] ArcIMS) and open source solutions (such as MapServer and GeoServer) can both provide a range of mapping functions and GIS capabilities. A major area of research is to determine which type of web mapping server is appropriate for The National Map. CEGIS’s research would have to draw on comprehensive user needs analysis and user feedback to help select appropriate map servers. In general, open source GIS servers can provide flexible functions and cus- tomizable user interfaces, but the developers of web applications require advanced programming skills and knowledge. On the other hand, commercial web mapping packages are easy to implement and can provide advanced map- ping functions with out-of-the-box tools and user interfaces. Open source programs can provide flexibility and adaptability for complicated projects, while commercial programs come with paid support that might be better for certain applications. CEGIS application developers will have to choose software packages that can provide enough functionality to fulfill the needs identified in functional specifications and mapping service objectives. These evaluations will include both the characteristics of mapping formats (such as image-based engines or stream-vector data based engines) and customizable GIS functions (identifica- tion, buffering, changing symbols and colors, etc.) (Tsou, 2004). Research could also assess web technologies, such as vector-compression algorithms, Asynchronous JavaScript and XML (AJAX), and Adobe Flex, that can improve the performance of web map servers. In addition, some software development platforms such as Java or NET can combine different web mapping technolo- gies; these could also be evaluated. Selected and refined web map servers would be connected with The National Map database (through collaboration between CEGIS and the National Geospatial Technical Operations Center [NGTOC]). Research Question: Will new web mapping technologies, such as vector- compression algorithms, AJAX, and Adobe Flex, improve the usability and system performance of The National Map servers and general web mapping applications? (short term)

62 A Research Agenda for GIScience at the USGS User Testing and Evaluation Methods for National Map Products. Iterative processes of prototyping and user evaluation are central to UCD (Box 3.2). Af- ter the implementation of a web server and map viewers, the next stage is to carry out comprehensive user testing and evaluation procedures for assessing the effectiveness of the web mapping services in the context of map use. This is an excellent way to determine the usefulness and functionality of a new system or application (Schneiderman, 1998). To help in planning future revisions, evalua- tion methods focus on usability and functional aspects of the prototype. Usability problems are design flaws encountered while working with a web mapping tool with poor or diminished user control, flexibility, efficiency, legi- bility, understandability, feedback, error prevention, visibility, ease of use, consistency, conformance to standards, and accessibility. These design flaws, if not attended to, can distract users from the overall purpose and potential of the prototype. There are many user testing and evaluation methods available for web-based GIS applications, such as expert review, questionnaires, videotaping, and web-log analysis (Schneiderman, 1998). These techniques need to be evaluated to determine which ones should be adopted for the web-based Na- tional Map products. The most appropriate procedures for user testing and evaluation have to be selected, and then the most useful statistical methods for analyzing the test results need to be established—again, with CEGIS collaborat- ing closely with NGTOC and The National Map system design team. Research Question: What is an appropriate standardized user testing and evaluation method for assessing and improving the effectiveness of National Map products? (short term) Open Geospatial Consortium Standard Profiles for Mapping Interoperable web mapping services allow map users to combine multiple web-based map layers from different map servers in a single map viewer. For example, users would be able to combine any layers from The National Map with data layers provided by the National Aeronautics and Space Administration (NASA), the Environmental Protection Agency (EPA), the Census Bureau, or local government map servers to seamlessly integrate mapping services. In the earlier vignette, interoperability standards ensure that National Map data can be overlaid in real time with airborne imagery and National Weather Service maps. The OGC is a leader in generating interoperability standards. 3 By custom- izing OGC Standard Profiles for The National Map web mapping services and 3 OGC is an international industry consortium of 335 companies, government agencies, and universities participating in a consensus process to develop publicly available inter- face specifications. See http://www.opengeospatial.org/ogc.

Research Priorities 63 map layer design, USGS will ensure interoperable web mapping services in all of its products and support their broader usage and accessibility. The major research challenge is to create standard profiles that are customized for USGS products yet strike a balance between OGC standards and proprietary data for- mats and protocols. This topic is a short-term task that can be accomplished in one to four years. Work on this topic will serve as an efficient step forward for USGS because many of its products are already built on OGC standards. There are many OGC standards associated with Web mapping applications, including Web Map Services (WMS), Web Feature Services (WFS), and Web Coverage Services (WCS). Research Question: How should USGS create OGC standard profiles (which are a subset of standard specifications and customized standard content) to bring layers in The National Map databases into conformance with OGC standards? (short term) The committee offers three examples of where it would be useful for CEGIS to work on adding or adapting OGC standard profiles. 1. A hydrological layer in The National Map using WCS (as opposed to the current WMS) standards would allow inclusion of temporary in- formation (such as pictures). This augmentation would, for example, help in recording flood danger and damage. 2. Customized metadata formats for particular USGS map layers would help meet the needs of different USGS users. Lengthy metadata with a one-size-fits-all format for all elements obstruct readability and inter- pretation by lay users. 3. Work on OGC styled layer descriptors (SLDs) would improve the qual- ity and clarity of USGS graphic products. USGS cartographers are currently constrained by the limited specifications of SLDs, which con- tribute to the coarse character of WMS products. OGC’s SLD extends WMS by allowing users and servers to set symbols and colors. The options for point, line, polygon, and text elements include color, width or size, opacity, texture, rotation, and halos. Lines may be outlines of polygons or centerlines and can be dashed. Text options include font family, style, weight, size, offset or displacement, and label anchor points. These are basic specifications that are limited to setting only two font characteristics— weight and size—with no dynamic variation in anchor position based on nearby features. Other cartographic needs, such as line spacing for stacked labels, curved labels, and character spacing are not specifically addressed in the OGC SLD and symbology encoding documentation.

64 A Research Agenda for GIScience at the USGS Much current work on automated label placement focuses on point label placement. For example, Kameda and Imai (2003) extend a slider algorithm, Ebner et al. (2003) develop a force-based simulated annealing algorithm, and Stadler et al. (2006) apply a two-step approach that combines algorithms. In contrast to this automated placement work that can be applied for on-the-fly labeling, WMS such as Google Maps look good because they are built from predesigned tiles with careful label placement and line joins made in advance rather than on the fly. By studying how to combine pre-placed elements in on- the-fly combinations, CEGIS can maintain USGS’s mapping prominence, en- hance the combination of multiple data sets, and contribute to the quality of public data display for more map purposes than covered by Google Maps and other popular WMS. For example, the type of road and point location reference information shown over remotely sensed images on Google Maps need not be limited to transportation features for USGS mapping. Hydrographic features such as streams and springs, physiographic features such as ridges and valleys, cultural features such as post offices and landmarks, and commercial features such as gravel pits and orchards could all be annotated on imagery depending on user interests. It is unlikely that online map users will want to download the spatial data files for an area of interest and combine them with labels in high-end graphics software. CEGIS research on automatic label placement could offer the advan- tage of USGS design skills for the reference labels over dynamic selections of spatial data instead of the limited WMS SLD labels that plague current viewers. Research Question: How can USGS overlay well-positioned labels with clear categories and hierarchies on top of symbolized features dynamically set to foreground and background depending on user interests? (short term) Through such research, nonspecialist users would have the benefit of USGS label placement skills (see, e.g., Figure 3.4 and the stark contrast in quality be- tween map viewer and printable PDF maps for the National Atlas in Figure 3.5) and be able to produce ready-to-read and ready-to-share mapping to support their localized decisions.

Research Priorities 65 FIGURE 3.4 An illustration of the quality of label placement USGS is able to supply to annotate a landforms map by Tom Patterson, National Park Service. This label quality is not offered by Web Map Servers. SOURCE: Tom Patterson, http://www.nacis.org/data/ us_physical/gallery/10_colorado.jpg. To provide the most usable mapping tools, the major challenge in WMS re- search is to select a balance between the OGC standards and proprietary data formats and protocols (such as ArcIMS AXL or GoogleEarth’s Keyhole Markup Language). To develop OGC standard profiles, CEGIS may work closely with OGC, ISO/TC 211, and NASA’s Geoscience Interoperability Office. USGS has already established the brand and look of U.S. topographic maps and can build on this background to adjust and update the look for current display media. These may be delivered by OGC SLD, ArcMap style files, Illustrator templates and styles, or other style mechanisms.

66 A Research Agenda for GIScience at the USGS A B FIGURE 3.5 Differences in map design and label quality between a prepared PDF map file and dynamically generated web map both offered at the USGS National Atlas web site: (A) a portion of the Colorado Printable Map of Federal Lands and Indian Reservations. SOURCE: USGS National Atlas. http://www.nationalatlas.gov/printable/images/pdf/fedlands/co.pdf (B) Federal Lands map produced in the National Atlas Map Maker, comparable in purpose and extent to (A). SOURCE: USGS National Atlas, http://www.nationalatlas.gov/natlas/Natlasstart.asp

Research Priorities 67 Integration of Data from Multiple Sources The San Diego fire is not yet contained. The crew assesses the current boundary of the fire, overlaid on the topographic map, which explains the diffi- culty of containing the spread upslope. However, there is still the unexplained spread to the east. The crew accesses the National Weather Service wind fore- cast, which is provided at a scale of 1:125,000 compared to the topographic map at 1:24,000. The crew invokes a tool for generalization of the topographic map to the smaller-scale weather data, and a trend emerges. To determine high-priority targets, the crew calls up an address directory and uses simple controls to geocode the addresses spatially on the fire map, showing location of structures in the fire’s path. To understand possible paths to fire sites, another layer with roads and another with trails are spatially matched (conflated) with the generalized map of topography. Finally, a remote sensing image with vege- tation types is fused with the other layers to determine potential fuel loads for the fire path. Integrating spatial data sets from a wide range of sources presents a funda- mental research challenge for CEGIS. Spatial data sets at disparate scales, resolutions, and quality are difficult to fuse or merge, and there is a series of issues in bringing these disparate data together for spatial analysis and decision making. The most basic challenge involves the compatibility of the geometry. For example, when the original topographic data are not available can the stream network layers and contour layers be aligned? Do the streets align with the cen- sus tract boundaries? Due to myriad decisions in the creation of the original data set, including scale, resolution, data quality, and feature selection, many or most features will not exactly align. A second challenge involves the “seman- tics” of integration. Is a “swamp,” for instance, categorized identically by two different agencies? Do two different counties classify a county road in a similar way? This involves semantic interoperability, where fusion also must occur at the attribute description level. As an example, integration was a central challenge when comparing changes over time in census tract boundaries in the National Historical Geo- graphic Information System (NHGIS). This system compiled all tract boundaries for the United States back to 1910 when the eastern cities were first tracted. Figure 3.6 illustrates the spatial mismatch of the 1990 Topologically Integrated Geographic Encoding and Referencing (TIGER) boundaries and the 1980 tract boundaries. When integrating these two data sets, the less accurate 1980 tract boundaries would be snapped to the more accurate 1990 TIGER boundaries.

68 A Research Agenda for GIScience at the USGS FIGURE 3.6 Mismatch between 1990 tract boundaries (red lines) and 1980 boundaries (purple lines). SOURCE: Van Riper (2003); used with permission of the author. Given the many data sets that might be integrated into The National Map, in- cluding terrain and contours, vegetation, hydrography, transportation, and cultural features for instance, the success of The National Map will depend on the ability to integrate or harmonize these disparate sources spatially and with the accompanying attributes. Two research topics are at the heart of data integration and must be ad- dressed early on. These are generalization and fusion (Box 3.3). RECOMMENDATION 4: The two priority research topics for CEGIS within the area of data integration should be generalization and fusion. Scale is a fundamental consideration in spatial data integration, and differences in scale between two data sources influence the level of the integration challenge. There are direct and strong relationships among scale, information content, and gen- eralization. Geographical features are often scale-dependent, and appear differently depending on the scale at which they are portrayed. For example, Figure 3.7 shows same section of a stream taken from topographic maps at four different scales (1:24,000, 1:50,000, 1:100,000, and 1:250,000). The top half of the figure depicts the line segments at the original map scale, and the bottom half enlarges three so they are all represented at 1:24,000. The detail on each line segment is dependent on the scale (which influences the amount of space possible to represent the line seg- ment at this scale). If, for instance, two hydrographic data sets were being fused at two different scales (i.e., the 1:50,000 and the 1:100,000), the 1:50,000 scale version would need to be generalized to the 1:100,000 scale version for spatial compatibility; that is, the level of spatial detail needs to be commensurate. An additional problem

Research Priorities 69 involves the selection of the “optimal” scale that will be needed for the given prob- lem, such as hydrologic modeling or depicting the dispersion of West Nile Virus in a region. This relates to the problem of “scale hierarchy,” which argues there is a cer- tain natural range of scales at which geographical processes operate—the operational scale (McMaster and Sheppard, 2004). BOX 3.3 Data Integration, Data Fusion, and Generalization Integration. Data integration is the assembly of information from different sources such that they work together as a whole. It is the combination of complementary informa- tion physically and logically such that applications can be written to make use of all relevant data (Jhingran et al., 2002). Fusion. Data fusion has the connotation of physically merging data sets and is often associated with merging images and other forms of sensor data. Llinas and Hall (1998) describe data fusion as the techniques for combining data from multiple sensors to achieve improved accuracy and more specific inferences than could be achieved by the use of a single sensor. Data fusion has also been broadly defined as "the process of organizing, merging and linking disparate information elements (e.g., map features, images, text re- ports, video, etc.) to produce a consistent and understandable representation of an actual or hypothetical set of objects and/or events in space and time" (OGC, 2000). For the pur- poses of The National Map, data fusion will be required, for instance, to merge remote sensing images at several resolutions (e.g., 30 m Thematic Mapper ™ imagery for year 1 with 10 m SPOT (Systeme Pour l'Observation de la Terre) imagery for year 2); to merge two layers generated at different scales (e.g., 1:24,000 road layer and a 1:100,000 vegeta- tion layer); or to bring together two shoreline data sets from different time periods. Nearly all modeling and spatial analysis routines assume that data are harmonious in terms of scale, resolution, and quality. Generalization. Generalization reduces the information content of maps due to scale change, map purpose, intended audience, and/or technical constraints. For example, when reducing a 1:24,000 topographic map (large scale) to 1:125,000 (small scale), some of the geographical features must be either eliminated or modified since the amount of map space is significantly reduced. All maps are, to some degree, generalizations as it is impossible to represent all features from the real world on a map, no matter what the scale.

70 A Research Agenda for GIScience at the USGS FIGURE 3.7 Four lines representing the same section of a stream taken from topog- raphic map sheets drawn at four different scales. SOURCE: Thibault (2001); used with permission of the author. With the goal of building The National Map from multiple (multiscale) data sources, methods will be needed to generalize and represent data at other scales—even down to neighborhood-level information on topics such as envi- ronmental quality or social conditions (e.g., position of the first sighting of a bird in spring, addresses of boarded-up housing, location of noxious odors) (Ghose and Huxhold, 2002). Generalization is thus required for scale harmoni- zation before fusion. Two data sets must be basically at the same level of detail—or granularity—before they can effectively be merged. Automated generalization and fusion have proven to be difficult research areas to move forward in a practical manner; therefore it is a challenge to iden- tify particular research questions that meet the priority for “early wins.” The following section covers generalization and then fusion research needs at CEGIS that the committee feels can best be addressed in a range of time frames. Priority CEGIS Research Topic: Generalization Generalization research has progressed over the last 20 years, including on algo- rithmic design (Regnauld and McMaster, 2007), database requirements (Mustiere

Research Priorities 71 and van Smaalen, 2007), and fundamental understanding of the process (Harrie and Weibel, 2007). Much of the current research in generalization is reported in a re- cently published book by the International Cartographic Association (Mackaness, et al., 2007). Many of the leading generalization researchers are in Europe. The Con- ception Objet et Généralisation de l’Information Topographique (COGIT) Laboratory at the Institute Géographique National (IGN) has worked in many areas of cartographic generalization, including data modeling to support generalization, algorithmic design and testing, and the application of agent-based methods to enable intelligent generalization. The latter project involved a collaboration among the IGN, University of Edinburgh, University of Zurich, and Laser-Scan (now called 1- Spatial)—a GIS company based in Cambridge, England. Significant work on gen- eralization is also taking place at the University of Hannover in Germany and at European National Mapping Agencies. In the United States, the National Science Foundation-funded National Histori- cal Geographic Information System housed at the University of Minnesota is working to develop a multiple-scale database for census information. In addition, ESRI has developed several generalization procedures and has a small generalization team in place to improve ARC GIS’s capabilities. The field is mature in the devel- opment of generalization tools, such as simplification and smoothing routines, but the research community is not as far along in understanding the importance of scale and generalization and identifying optimal scales for certain geographical processes. The USGS already has several ongoing projects related to generalization, including “Generalization for The National Map” as detailed in Appendix D. Figure 3.8 shows the importance of robust approaches to generalization. In this example, Figure 3.8(a) represents raw TIGER data and the problems in coastal areas with a high density of coordinate information. Figure 3.8(b) depicts the results of the NHGIS generalization of this coastal area, while Figure 3.8(c) shows the Census generalization. Figure 3.9 shows the generalization of a piece of the Florida coast- line at two different scales (1:150,000 and 1:400,000). The NHGIS generalization eliminates some larger inlets and simplifies the boundaries for an appropriate repre- sentation at the desired scale.

72 A Research Agenda for GIScience at the USGS FIGURE 3.8 County boundaries along the Florida Gulf Coast drawn at 1:2,000,000: (a) base data from the Census TIGER files, with inland water extensions clipped, (b) NHGIS generalization for a 1:2,000,000 target scale, and (c) the Census cartographic boundary files. SOURCE: With permission from the NHGIS and Jonathan Schroeder. FIGURE 3.9 The Florida Gulf Coast produced a scale of 1:150,000. SOURCE: With permission from the NHGIS and Jonathan Schroeder. In an automated environment the generalization process is complex and mathematically based. Whereas “human” cartographers have been generalizing maps for hundreds of years through the application of geographic logic, com- puters require exact instructions, or algorithms. The latest generalization processes involve applying one or a series of “operations.” There are many such

Research Priorities 73 operations, and the five that CEGIS would most likely have to consider are sim- plification, smoothing, refinement, exaggeration, and enhancement (Box 3.4). BOX 3.4 Types of Generalization Operations Five possible generalization operators that CEGIS would need are discussed below. These are only a subset of a much larger collection of operations that have been devel- oped. The five are: simplification, smoothing, refinement, exaggeration, and enhancement (see figure below). 1. Simplification is the most commonly used generalization operator. This involves, at its most basic level, a “weeding” of unnecessary coordinate data. The goal is to retain as much of the geometry of the feature as possible, while eliminating the maximum number of coordinates. Most simplification routines utilize complex geometrical criteria (distance and angular measurements) in selecting significant or critical points. 2. Smoothing (not to be confused with simplification) shifts the position of points to im- prove the appearance of the feature. Smoothing algorithms both relocate points in an attempt to plane away small perturbations and capture only the most significant trends of the line, or they can add points using splining routines (McMaster and Shea, 1992). As with simplification, there are many approaches to the process. Some of these op- erate at the local level while others process the entire line at once. Careful integration of simplification and smoothing routines can produce a simplified, yet aesthetically ac- ceptable, result (McMaster, 1989). 3. Refinement is another form of resymbolization that involves reducing a multiple set of features such as roads, buildings, and other types of urban structures to a simplified representation. The concept with refinement is that such complex geometries are re- symbolized to a simpler form that represents a “typification” of the objects. The example of refinement shown in the figure below is the selection of a stream network to depict the major characteristics of the distribution in a simplified form. This might be accomplished, for instance, by eliminating streams of order 4 or higher based on the attribute field for the stream. 4. Exaggeration is one of the more commonly applied generalization operations. Often it is necessary to amplify a specific part of an object to maintain clarity in scale reduc- tion. The example in the figure below depicts the exaggeration of the mouth of a bay that would close under scale reduction, as would occur with San Francisco Bay or New York Harbor. 5. Enhancement involves a symbolization change to emphasize the importance of a particular object. For instance, the delineation of a bridge under an existing road is of- ten portrayed as a series of cased lines that assist in emphasizing one feature over another. Enhancement also involves the enlargement of certain symbols—such as buildings—as scale is reduced and the minimum size of the object becomes too small.

74 A Research Agenda for GIScience at the USGS Fundamental generalization operations. SOURCE: Slocum et al. (2005); used with permis- sion. Work on The National Map will require CEGIS researchers to develop unique generalization operations that can be automated for the many possible data types and map scales. The research questions in this area vary, with short- and long-term time lines. As a first step, CEGIS will have to complete a needs assessment for generalization to prioritize work in this area. This should yield the specific generalization challenges for The National Map. Further refinement of this would specify those features of most relevance to The National Map to limit the scope of this research. In generalization and scale manipulation,

Research Priorities 75 locational conflicts between objects are possible, and research should address this issue. When maps are generalized and fusion is attempted, the loss of in- formation can affect the success of the fusion process. Research is needed to define the boundaries of scale ranges for such operations. Some key questions that need to be addressed in this prioritization include the following: Research Question: What are the specific new generalization operations and algorithms that will be needed for The National Map? (short term) Research Question: What feature-based generalization is needed for The National Map (the focus would be on a specific feature, such as a stream, and approaches needed for stream generalization) and how can that be ac- complished? (long term) Research Question: What new kinds of measurements will be needed to de- termine locational conflicts between USGS features? (short term) Research Question: What are the effective scale ranges for fusing two lay- ers together, and how does generalization affect fusion? (long term) After the assessment of needs addressed by these questions, priority will have to be given to determining which algorithms should be applied to the major feature classes. For example, this study might focus on the generalization of road networks using simplification, smoothing, and displacement. CEGIS re- search on generalization will have to address challenges beyond integrating data. These challenges include generalization for cartographic display, which fortunately shares many of the same approaches. Generalization is also needed to create multiscale-multiresolution databases. These databases are increasingly required by cartographers and other geographic information scientists. This approach assumes that from a master database one can generate additional ver- sions at a variety of scales (Box 3.5). The scope of such multiscale databases is driven by the user. For example, when mapping census data at the county level, a user might wish to have signifi- cant detail in the boundaries. Alternatively, when using the same boundary files at the state level, less detail is needed. Since the generation of digital spatial data is extremely expensive and time consuming, one master version of the da- tabase is often created and smaller-scale versions are generated from this master scale. USGS will need to carefully consider multiscale multiresolution databases in the context of The National Map and their relationship to geospatial data and processes. CEGIS has one project focused on generalization (see Appendix D) from which further research can be built.

76 A Research Agenda for GIScience at the USGS BOX 3.5 Research on Multiscale Databases Although the European literature contains a number of conceptual frameworks for automated map generalization, few have had as significant influence on U.S. researchers as the models of Sarjakoski (2007) and Brassel and Weibel (1988). The German ATKIS (the Official Authoritative Topographical Cartographic Information System; Vickus, 1995) and Kilpelainen (1997) developed innovative frameworks for the representation of multiscale databases. Assuming a master cartographic database, called the Digital Landscape Model (DLM), this research proposed a series of methods for generating smaller-scale Digital Cartographic Models (DCMs). The master DLM is the largest-scale, most precise database possible, whereas secondary DLMs are generated for smaller-scale applications. DCMs, on the other hand, are the actual graphical representations, derived through a generalization-symbolization of the DLM. The master DLM is used to generate smaller-scale DLMs (model generalization), which are then used to generate a DCM at that level. The assumption is that DCMs are generated on an as-needed basis (cartographic generalization). An additional complexity may be that the boundaries change at given times, such as decadal change for the Census Bureau’s TIGER data. For many human- social and environmental databases, research is needed to develop and update multiscale versions. Priority CEGIS Research Topic: Data Fusion Despite more than a decade of research on this topic, fusing disparate data sources is still a significant challenge to be confronted by CEGIS. The most significant challenge is the fusion of spatial data sets that are generated from different sources, such as a Department of Transportation road layer, a USGS hydrographic layer, and a set of census tract boundaries. It is also a significant issue when considering fusing the diverse scientific data sets of the USGS into The National Map. Each could be at a slightly or significantly different scale, of different quality, and in different data models that must be converted. Harmo- nizing such disparate data sets, through the application of map conflation and edge matching, will need to be an early and high priority for CEGIS. Other re- lated, but perhaps longer-term and lower-priority, topics include fusion across time and fusing spatial with spatial data. These approaches are described in more detail below. The success of CEGIS and The National Map will require that the capabil- ity for data integration and fusion is in place at an early stage. Indeed, CEGIS recognizes that this is a significant issue and has an active research project titled Automated Data Integration (Appendix D). This project is looking at integration from both a layer-based and feature-based approach. Conflation and Edge Matching. Map conflation and edge matching are related approaches that are often utilized to fuse data. Conflation involves first identify- ing features within one reference map that are accurate locations of real-world

Research Priorities 77 objects that need to be combined with one or more target maps (DeMers, 2003). A set of control points is identified, and their locations are reconciled with the selected objects. Next, and often in an iterative process, the features are shifted to obtain the best possible alignments. Edge matching involves “zipping” to- gether features along map sheet edges. This process in particular will be critical for creating the seamless quality desired for layers in The National Map. Typi- cally, conflation and edge-matching alignments are performed manually by identifying a set of control point pairs across different layers and then using “rubber-sheeting” techniques to align the features (Saalfeld, 1987). This ap- proach is slow and labor intensive and usually generates a single solution across the entire layer(s) that need to be aligned, that is, the current methods do not usually allow for specific location-based solutions. Knoblock and Shahabi (2007) developed techniques for accurately and automatically integrating vector data—representing space with points, lines, and areas—with high-resolution color imagery. Their approach utilizes automated localized image-processing techniques to find control point pairs in the pair of data sets being fused. In addition, they developed novel filtering techniques to remove inaccurate control points. Their approach does not need to locate all of the intersection points to accurately align the vector data with the imagery, and current implementations of their methods rely on triangulation and rubber- sheeting algorithms to align the remaining lines and points in the two data sets using the control point pairs (Figure 3.10). This approach, while promising, generates almost as many questions as it does answers. First and foremost, a much larger and more versatile set of tests is needed to demonstrate the efficacy of these new techniques and whether or not they can be used to integrate multi- ple data themes and feature types across a variety of land surfaces and cover types. These are important subtleties, and early test results (Chen et al., 2003, 2004) along with those from the standard manual techniques suggest that trans- portation data—the only type examined so far—are among the most likely to generate favorable results and that this outcome is especially likely when the method is applied to areas with flat terrain.

78 A Research Agenda for GIScience at the USGS FIGURE 3.10 Automatic conflation of road vector data with imagery. This illustration depicts the conflation process applied to an orthophoto and street centerline file. After the identification of key registration points (street intersections) by the circles, the street line file is spatially aligned with the orthophoto. The right-hand illustration shows the cor- rected images. SOURCE: Knoblock and Shahabi (2007); used with permission. Although theoretically The National Map will be a seamless database not requiring true edge matching, many other projects that integrate USGS data with other data from state and local levels will require edge-matching capability. Another process that will be needed is “splicing,” sometimes called coordinate inlay, in which a part of one map (database) is spliced into a larger map such as a newly designed freeway interchange into an existing transportation layer. The challenges of this type of fusion are illustrated in the case of integrating hydro- logic networks derived from topographic data (often from digital elevation models) with transportation data in which the roads often follow the hydrology. Given the different processes that generated these layers, they will not exactly match and will have to be harmonized. Conflation and edge matching are typi- cally not automated and are therefore time consuming. Research Question: What are the data quality issues related to spatial data integration and fusion? (short term) CEGIS would benefit from coordinating its existing fusion research with related activities in the National Geospatial-Intelligence Agency (NGA) geospatial re- search program. In that program, academic researchers are looking at topics such as Multi-Sensor Data Fusion, Analysis, and Visualization, Seamless Integration of Geospatial Data from Water to Land, Conflation Research in Support of Gazet- teers, Spatial Uncertainty Models to Automatic and Enhance Fusion, and Spatiotemporal Data Fusion (NGA, 2006). Fusion Across Time. The need for fusing data layers across time becomes ap- parent when considering coastal processes that are being analyzed using data

Research Priorities 79 such as shoreline position that change through time. Fusion methods for harmoniz- ing such temporal data sets will be essential to realize the full utilization of The National Map. Although research on data fusion has developed a series of ap- proaches over the past few decades, such as the efforts in map conflation and rubber sheeting, there is no comprehensive “fuse-layers” button to push. There remains significant work first in harmonizing the data sets in terms of scale and quality and then in approaches to spatially fine-tune the integration on specific parts of the map. The upcoming section on data models includes a related discussion on research on spatiotemporal models. Fusing Spatial and Aspatial Data. To maximize its value, The National Map will likely move beyond solely fusing the traditional USGS natural resource data types and be sufficiently flexible to integrate multiple forms of social data, including cen- sus, economic, and agricultural data—many of which are aspatial. Research Question: How can areal interpolation—as a key method for fusing aspatial data with spatial data—be applied in The National Map? (long term) Significant work on areal interpolation has been completed by Tobler (1979), with his pycnophylactic approaches; by Gregory (2002), working with the United Kingdom’s historical GIS project; and in research at the NHGIS (Van Riper, 2003). Much of this earlier work has focused on areal interpolation across disparate boundaries such as census tract boundaries. Many census tract boundaries have shifted over time due to enumeration splits, and fusions will result in noncomparable statistical units (e.g., census tract 101 in 1980 may be split into 101A and 101B in 1990). The result is that a direct comparison of the population statistics, such as median housing value, are not meaningful unless the data are reaggregated. There are several procedures for dealing with such spatially noncoincident boundary files. One involves spatial interpolation, where estimates may be made by computing are- ally-weighted estimates from one area for another area. Using the example above, if one third of the 1980 tract 101 area was assigned to 101A in 1990 and two thirds of the area was assigned to 101B, then by overlaying the two boundary files the per- centage of the area could be interpolated and used as a weight in a statistical comparison. Areal interpolation is a fairly mature research area—especially with census information, although there has been less work in integrating census data with other layers such as land use or land cover information. Data Models and Knowledge Organization Systems A California regional dispatch operator gets a call about a new fire that has just been spotted in Sycamore Canyon. The caller further indicates that the fire is moving quickly up the west face of the canyon. The dispatcher does not know

80 A Research Agenda for GIScience at the USGS Sycamore Canyon or its location. Using a local geographic region profile to search the online National Map, the dispatcher enters Sycamore Canyon and ob- tains a coordinate footprint of the canyon from The National Map gazetteer. Using the returned footprint, the dispatch system zooms to the canyon’s location. The dis- patcher selects an option within The National Map portal that uses the canyon footprint to automatically query geospatial databases housed in several different locations to obtain information on roads, streams, land cover, houses, and fire hy- drants within the canyon. In addition, the dispatcher is able to select a three- dimensional image of the canyon terrain that is offered as part of the initial query results. The dispatcher clicks the west wall of the canyon to select it and adds anno- tation that the fire was sighted moving rapidly up this face. The National Map portal seamlessly integrates the retrieved streams, roads, houses, and land cover onto the three-dimensional display and the dispatcher sends the assembled data set to the fire control and command center. With this information in hand, an emergency response team departs only minutes after the call was received. Three important elements in this scenario are the immediate access to informa- tion based on a common place name, explicit representation of a landform feature (canyon) as a queriable object in the database, and explicit definition and representa- tion of landform feature parts as objects (canyon wall). The feasibility of these capabilities and of rapid response in such a scenario is in large part a function of the underlying data models (Box 3.6) and supporting geographic knowledge organiza- tion systems (Hodge, 2000). New data models and associated knowledge organization systems for The Na- tional Map can translate traditional topographic information into a flexible spatiotemporal knowledge base that can serve many different application areas. Transformation of The National Map database into a comprehensive geographic knowledge base can bring new dimensions to topographic information delivery and revitalize the role of the USGS as provider of geographic information and a valuable geospatial integration framework. CEGIS, as a research unit of the USGS, carries with it authority for research on topographic information and hence can play a critical formative role in the develop- ment of ontologies or knowledge bases for topographic information. This section prioritizes and describes five research areas on which CEGIS could focus. These topics, listed in the committee’s priority order based on the criteria presented earlier, and beginning with the highest priority, follow: 1. Geographic feature ontologies (e.g., hydrographic, terrain, or coastal fea- ture ontologies) 2. Geographic feature data models based on these ontologies, and an associ- ated gazetteer as an extension of the Geographic Names Information System (GNIS) 3. Quality-aware data models

Research Priorities 81 4. Data models for time and change 5. Semantics-driven transaction processing BOX 3.6 Data Models, Knowledge Organization Systems, and Ontologies A data model is an abstract description of real-world entities for representation in a da- tabase management or information system. Data models are critical for effective representation and retrieval of information. They specify what entities are explicitly repre- sented, their attributes, and the relationships among entities. The model determines the capability and flexibility to access information, manage multiple versions of information, and update information effectively. Knowledge organization systems are an important comple- ment to data models. They are formalized specifications of domain knowledge that include taxonomies, thesauri, gazetteers, and ontologies. They provide important authoritative or community-sanctioned domain knowledge in forms that are explicit and shareable by both humans and computational systems. Ontologies are one of a number of knowledge organization systems (Soergel, 2000) that help to define and organize the information resources for a domain, discipline, or insti- tution. Ontologies specify the kinds of concepts or entities that exist or may exist in some domain or subject area and relationships among them (Sowa, 1998). An ontology, through specifications, provides formalized definitions of concepts and expected relationships among concepts for consumption by humans and computers. Geographic feature ontolo- gies are important because they provide standardized definitions for use within the community, a basis for operational definitions of features that can support automated fea- ture extraction, and a semantic reference framework for matching and exchanging information across communities. Geographic feature ontologies that formally specify to- pographic features, their parts and structures, and their relationships to other features additionally provide the conceptual foundation for enhanced data models organized around features rather than map layers and around the semantic structure of features as opposed to abstract geometric and topological structures. The first two components are critical infrastructure for the geospatial and broader communities and are therefore vital initial areas of CEGIS research. RECOMMENDATION 5: The two priority research topics in the area of data models and knowledge organization systems should be developing geographic feature ontologies and building the associated feature data models and gazetteers. The USGS science strategy (USGS, 2007) stresses the importance of data inte- gration and these two components in Recommendation 5 are critical to a data integration strategy. Integration and interoperability at the syntactic and structural levels have been addressed in large part by OGC standards, but integration at the semantic level needs attention. Effective semantic integration of information relies on knowledge supplied by ontologies and other knowledge organization systems. In the broader context of the web and multiple online distributed repositories of infor- mation, geographic feature ontologies and an enhanced gazetteer become essential for effective geospatial information access, retrieval, and exchange. The geographic

82 A Research Agenda for GIScience at the USGS feature ontologies are, in particular, a top research priority because as they build the conceptual foundation for the subsequent data model research components. For ex- ample, data quality-aware features and smart transactions on features depend on having a clear and formal specification of these features. Priority CEGIS Research Topic: Geographic Feature Ontologies Information not made explicit by a data model and knowledge sources is infor- mation not directly and easily accessible to users. While the current National Map contains much geographic information, it is not able to respond to many types of requests for geographic information because of the lack of explicit representation of certain features, part of features, and feature-feature relationships. Canyons for example are implicit within a terrain model (some may have labels in the GNIS), but generally they are not explicitly modeled features for which one can query a database. Other terrain and coastal features such as bays, coves, penin- sulas, gulfs, and sounds and some hydrologic features (e.g., oxbows) appear in The National Map layers, but they are not explicitly modeled. The current National Map model supports geometric and topological relationships among features but not the semantic relationships commonly used by people. Rivers and streams, for example, are connected geometrically and topologically into abstract networks, but these fea- tures are not additionally modeled and identified according to common semantic parts (e.g. mouth, source, and tributaries) as understood and likely to be requested by people. Thus, The National Map cannot respond directly to a simple geographic information query such as, Where are the mouth and tributaries of the Kennebec River? In another fire example, suppose a user wishes to retrieve all canyons in Cali- fornia involved in fires over the last five years. The National Map currently has no ability to respond to such a query. One issue of course is that there is no information on fire events in The National Map database, but the critical point is that The Na- tional Map cannot respond to any query on canyons (a basic topographic feature) because it simply does not know what a canyon is. A canyon represents one basic topographic feature among many about which The National Map has no knowledge. If The National Map is expected to be more than a source of maps and become a source of comprehensive geographic information, then it must be made to “know” about basic geographic features. The point of the geographic feature ontology is to do just that––create formal specifications of canyons and other topographic features and their parts and structures such that The National Map is able to perform auto- mated computations and respond to queries on geographic features. The USGS specification of the digital line graph represented an early example of an essentially ontological specification of cartographic features. A new research increment on this work is the development of semantic specification of key geo- graphic features. Geographic feature ontologies can formally define a set of key

Research Priorities 83 geographic features of interest to NGPO and more generally USGS. Recognizing the importance of research on this topic, NGPO has already embarked on exploration of ontologies for The National Map and funded a prospectus project on this topic in the first round of CEGIS projects (Appendix D). This is a good start, but efforts should be coordinated in such a way as to create a systematic focus on foundation topog- raphic features. Just as the topographic map served the central role for integrating map information, an ontology that specifies the semantics of geographic features can provide a new and critical integration framework at the semantic level. The United Kingdom (U.K.) Ordnance Survey has also embarked on research on ontologies (Goodwin, 2005), and coordinated development and shared knowledge between CEGIS and the U.K. Ordnance Survey could be mutually beneficial, as well as co- ordination with U.S. government agencies involved in the use of geospatial data such as NGA, Department of Homeland Security (DHS), EPA, National Oceanic and Atmospheric Administration (NOAA), and U.S. Department of Agriculture (USDA). In addition, the biology community has made substantial progress in the development of ontologies (e.g., the gene ontology) and lessons from its work could be examined. Research on ontologies is relatively recent, and ontology specification lan- guages have only recently become readily available as open source and commercial products. A concentration of active researchers on geographic ontologies resides at the University of Buffalo. Relevant research for USGS on geographic ontologies ranges from development of upper-level ontologies (Smith and Mark, 2001; Fonseca et al., 2002; Grenon and Smith, 2004) to domain-specific ontologies (Feng et al., 2004; Sorokine et al., 2004; Sorokine and Bittner, 2005). Specification of ontologies of features implies specification of objects—and objects are assumed to have com- plete and closed boundaries. However, a common characteristic of geographic features is indeterminate or ambiguous boundaries (Burrough and Frank, 1996). Ambiguous boundaries create a challenge for topographic feature ontology devel- opment (Mark and Smith, 2004), and the difficulties of delimiting many topographic features are likely the reason many have remained only implicitly represented in geospatial databases. The goal of feature definition should not be to have one ulti- mate boundary but rather to allow the possibility of many boundaries. Just as features can have multiple names they can have multiple boundary representations. The existence of multiple boundaries for features in a database is in fact a logical expectation and outcome of observations by different sensors at different scales and different times and under different interpretation and processing. The suggested research questions in this area could be addressed in the short term. Research Question: What are the key sets of topographic features portrayed within The National Map layers that should be explicitly represented in ontolo- gies (these might align with the set of features already identified within the Spatial Data Transfer Standard; USGS 1994)? (short term)

84 A Research Agenda for GIScience at the USGS Research Question: What are the formal operational definitions for these fea- tures, their parts and structures, and their relationships to other features? (short term) Research Question: What automated feature extraction methods are derivable from these operational definitions? (short term) An important consideration in framing some feature definitions is the potential to translate them to operational definitions useful for automated or semiautomated feature extraction. Consider for example the definition of a bay. It can be defined as a body of water partially enclosed by land but with a wide mouth, affording access to the sea. Automatic segmentation and extraction of bays from existing digital shoreline files or image sources however would require more detailed operational specifications. Most research in this area relates to landform classification based on digital terrain models. There is less work related to identification and extraction of individual landform features from terrain models (Saux et al., 2004)—particularly on the basis of a landform ontology—although this is the objective of Mark and Smith (2004). Ontological specification of geographic features offers an effective formal basis for such work. Priority CEGIS Research Topic: Ontology Driven Data Models and Gazetteers Ontologies provide a framework for structuring information system content and clarify the things one wishes to model. An ontology can synthesize collective knowledge of things that exist, their properties, and relationships among them. Such synthesized knowledge of “what a canyon is” or “what a stream is,” for example, can provide a framework for organizing and integrating myriad information gathered on such features by different observers or sensors, at different scales, resolutions, and time periods. To illustrate the clarity of ontology, consider how one defines a canyon. A canyon will have a certain structure and behavior that can be specified in an ontol- ogy. Given this specification, all observations on canyons would be expected to be consistent with it. Multiple observations on any one canyon however can generate multiple versions of its location, size, shape, spatial relations to other features, and many versions of its nonspatial attributes, creating potential organizational and rep- resentational complications. The ontological clarity is that there is one underlying canyon with some invariant structure and different observations are simply multiple views of this entity. The invariant structure of a feature type should be captured by the ontology. The invariant structure of a canyon is that it has a floor and walls. The expectation would be that the feature database schema would inherit the invariant structure of the feature type as specified in the ontology and link this structure with the multiple representations obtained for each instance of a feature. A canyon and its invariant parts (walls and floor) could be associated with many spatial

Research Priorities 85 representations that might include different structure and resolution terrain rep- resentations or different temporal versions depicting different temporal states (e.g., pre-landslide, post-landslide). An ontology thus provides a conceptual organizing framework for masses of heterogeneous data and information. Development of a feature based data model is a logical follow-on from geographic feature ontologies, and so is a longer-term re- search question. Also, there are open research questions about the nature of this association: If the ontology is modified, how are modifications propagated to the database? Do queries to the ontology propagate to the database? Research Question: How does a geographic feature ontology operationally support a National Map feature database? (long term) Figure 3.11 depicts a canyon and its parts as specified in an ontology (gray box). A canyon can have one floor and many walls. The associated feature database includes multiple spatial representations for canyon parts that can be assembled to represent a canyon. A canyon floor can be represented with one or more polygons representing different levels of detail, and its walls can be represented with one or more sets of triangulated irregular network (TIN) faces or elevation grids. A feature ontology provides a conceptual foundation not only for a geographic feature database but also for an enhanced gazetteer for The National Map. A gazet- teer is a knowledge organization source that manages and translates between heterogeneous location representation forms (text [place names], geocodes, coordi- nate footprints) for geographic features. A place name is an implicit location reference but an important human-centered one. The gazetteer serves the essential role of connecting place names to other forms of location representation. By provid- ing the connection across different location representations, a gazetteer can significantly expand human and machine search capabilities for geographic informa- tion and is thus another vital component of a semantic integration framework. A three-way integrated model construct that includes geographic feature ontology, gazetteer, and geographic feature database could form the basis for a comprehensive geographic information integration framework. To illustrate the potential, consider a case in which a USGS researcher wants to investigate the relationship of fire damage and landslide incidence. An example extract of a fire event database from the dis- patch office is as shown in the upper part of Figure 3.12. Location is a text field that includes place names. A landslide database compiled by extraction of slide scars from satellite imagery has the format shown in the lower part of Figure 3.12. Loca- tions are given by coordinates defining polygons. Given these two particular database configurations, no direct spatial connection between fire and landslide events is possible.

86 A Research Agenda for GIScience at the USGS FIGURE 3.11 Depiction of an ontology of a canyon and multiple spatial representations as stored in a feature database. FIGURE 3.12 Example of incompatible event databases. The fire event database (upper box) has only a place name location representation, while the landslide scar database (lower box) has a coordinate location representation. Without the mechanism to translate between these different location representations (as supported by a gazetteer) there is no basis to determine collocation or other spatial relationships among these different events.

Research Priorities 87 Knowledge supplied by a gazetteer and geographic feature ontology linked to a geographic feature database can make possible the automated integration of the two event databases. The gazetteer provides the translation between place names and coordinate footprints (Hill, 2006). The specification of a canyon in the ontology can define canyon structure and parts. Support for the above sce- nario requires further research on gazetteers that could be addressed in the short term. New data models for the gazetteer can align with ontology development and be consistent with ongoing work on digital gazetteer content standards de- veloped as part of the Alexandria Digital Library Project (Hill, 2000; Hill and Goodchild, 2000; ADL, 2004) and OGC web gazetteer services. Research Question: How can the collection, validation, modeling, and management of vernacular names be facilitated? (short term) Research Question: How can the creation of more detailed or smart feature footprints be automated? (short term) Research Question: How can the implications of fuzzy footprints in gazetteers be managed? (short term) The current GNIS lacks entries for many important named features whose locations and extents are fuzzy or incompletely defined such as mountain ranges, valleys, plains, basins, gulfs, bays, and harbors. These are limitations that may be corrected as an outcome of the ontology development. Research on gazetteer development is not extensive, but relevant research in this area in- cludes work of the Spatially-Aware Information Retrieval on the Internet (SPIRIT) project (Jones et al., 2003, 2004), work on fuzzy features and foot- prints (Alani et al., 2001; Wilson et al., 2004; Fisher et al., 2004; Purves et al., 2005), and work on ontologies and gazetteers (Lutz and Klien, 2006). While there is potentially more expressive query power in having more explicitly de- fined features, there are also associated uncertainty issues and implications with fuzzy footprints. For example, while a query such as how many streams origi- nate in the Appalachian Mountains? or how many islands there are in Casco Bay? may be enabled, the attendant uncertainty must be addressed. The NGPO, with the responsibility for GNIS, is in a key position to direct and coordinate gazetteer research and development. This is a short-term research initiative that could make a substantial contribution to enhancing the spatial and semantic integration role for The National Map. Development of feature ontologies, ontology driven data models, and the gazetteer has to be coordinated since the hard work of defining feature classes (e.g., canyons, mountains, bays) and their boundaries is necessary for the speci- fication of feature footprints in the gazetteer. An important role for CEGIS in such an effort is the official sanction or standardization it can lend to this process.

88 A Research Agenda for GIScience at the USGS Standardization of operational definitions for topographic features and hence feature footprints can lead to standardized algorithms for feature footprint ex- traction rather than many ad hoc approaches. Quality-Aware Data Models Uniformly high-quality data has been a signature characteristic of USGS topographic information. Given the changing environment in which many het- erogeneous sources now contribute to the databases of The National Map, new challenges for data quality assessment and management arise. In the past, qual- ity control was standards driven, and generally externally and globally applied. In other words, whole data sets were compared to independent sources of higher accuracy to assess compliance with the standard. In an environment where new data may be submitted in small increments as updates on individual features in a spatially and temporally ad hoc manner and from diverse sources, new methods to assess data quality need to be explored. An internally managed approach in which the database has built-in redundancy and benchmarks to assess quality offers some promise. A feature database that supports multiple spatial versions of the same features at different levels of detail or different temporal states cre- ates the opportunity for such a quality-aware data model. Multiple versions of features create replicates and the potential for empirical distributions on features states. Imagine many versions of the boundary of a lake collected over time and with different levels of detail from several different sources. As a database ac- cumulates these versions, it begins to have the information to identify means, medians, and percentiles for feature attributes including locations. Such a strat- egy creates a trade-off in storage overhead for multiple versions but with the benefits of enhanced quality assessment. Tu et al. (2005) examine this problem of multiple-quality replica selection subject to an overall storage constraint. If each feature has a distribution of observed values for its various properties, the database can be designed to work with these distributions for various quality assessment tasks. These distributions can be used for example as a basis to evaluate and categorize incoming transactions from local cooperative partners. Suppose a partner submits a new GPS-generated road segment and suppose sev- eral versions of this road segment are stored in the database. The new submission can be compared with the existing set to see if it is an outlier (i.e., could represent a legitimate change, or an error) or falls “close” to the mean of the set. Such a concept raises a number of longer-term research questions around which a coherent research initiative could be built. In particular, Research Question: How can sampling distributions of complex objects be defined and managed (e.g., reduce them to points in some N-dimensional shape space)? (long term)

Research Priorities 89 The expectation is that multiple spatial versions of features would follow a normal distribution, but statistical tests would require specification of means and variances for these complex objects. Work at the University of Maine has inves- tigated Least Squares Collocation (LSC) and geostatistics as positional accuracy diagnostics tools (Agouris et al., 2001). The potential of simulated versus em- pirical sampling distribution could be explored. Various types of feature update transactions could then follow from distribution specifications. For example, new versions of a feature that are close to the mean might be rejected as redun- dant. Outlier versions might also be rejected as errors or alternatively checked and retained as important variants due to change or temporal state. This is a new area of research in terms of context (i.e., embedding quality assessment within database transactions for complex spatial objects), but it can bring to bear ongo- ing work on least squares, geostatistics and high-dimensional statistics, indexing, and dimension reduction. Data Models for Time and Change The National Map data model does not now support explicit representation of temporal states, change, and dynamic relationships among geographic fea- tures. Consequently, there is no framework for storage, retrieval, and access to previous states of geographic features, changes, and events. For example users cannot query for past states (e.g., “get flood states for the Kennebec River for the last five years”); query for feature states within a specified time interval and spatial range, and display returned states (e.g., “retrieve the state of lakes for April 1-15 for latitudes 44-45 degrees”); or ask for projected views for future states (what is the expected water level in Lake X for the month of April?). A range of physical process models including fire behavior models, ecosystem phonological modeling, and disease spread models (see McMahon et al., 2005, for more examples) require spatiotemporal inputs and could benefit substantially from the addition of the time dimension to The National Map. The history of spatiotemporal models for GIS (Armstrong, 1988; Langran, 1992) begins in the late 1980s. Early work viewed the central unit of analysis as the spatial layer and change was conceived as modifying the fabric of a layer (Langran, 1992). More recent views include an object change view and an event view. Spatiotemporal information queries can then be done based on various spatiotemporal models (Yuan and McIntosh, 2002). Object change-based data models (Abraham and Roddick, 1999; Worboys, 2005) can track changes in geographic features. In contrast, the event-based model focuses explicitly on tracking the change itself (Claramunt and Thériault, 1995; Peuquet and Duan, 1995; Worboys and Hornsby, 2004; Worboys, 2005; Beard, 2006). While the object-based model records the changes in the property of an object, the event model considers change as the explicit entity of interest.

90 A Research Agenda for GIScience at the USGS For example, assume that the average pH of a lake changes over the course of a year. In the object-based model the primary object is the lake and one would track pH changes as non-spatial property changes to the lake. In the event view, a change itself, for example an abrupt drop in pH, is the specific entity of inter- est along with type specific event properties such as intensity, time of onset, duration, or cessation, and location. The USGS science strategy strongly advocates development of methodolo- gies for change analysis and The National Map has a role to play here. Data model enhancements that support time, change, and events are therefore central to the interdisciplinary science agenda of the USGS. While The National Map has adopted the role of the topographic map as a framework for spatial information integration, it may be further investigated as a framework for spatiotemporal information integration. Records of events and processes are a basis for understanding dynamic behaviors, and USGS is already collecting and accumulating event data (seismic events, landslides, etc.). Envi- ronmental monitoring by other agencies and emerging sensor networks are creating repositories of information with high temporal resolution that support the analysis of change. Additionally, physical process-based spatiotemporal models that produce spatial snapshots in time at regular intervals (e.g., hourly, daily, annually) are used widely in such fields as hydrology, ecology, and bio- geography. These models make use of geographic information layers as input but are not well accommodated by traditional GIS databases. It is important to have data models that are both spatially and temporally explicit. This is particu- larly the case in USGS where hydrologists, ecologists, and geographers are adopting more quantitative modeling tools and considering using geospatial data to calibrate models or vice versa. Therefore, research in this area could have great benefits to other USGS disciplines and other scientific agencies in devel- oping new techniques for combining and analyzing spatiotemporal data. A role for The National Map in such a setting is to provide the appropriate temporal as well spatial contexts in which to analyze change or event data and support spatiotemporal process models. As an example, suppose researchers have detailed spatial records of burn scars for a set of historic wildfire events and they wish to run a fire model to examine how well the model can replicate such events. Assume the fire model runs over a detailed landscape-terrain rep- resentation that include roads, structures, and land cover. The researchers want to assemble the landscape settings that are most temporally consistent with each fire date. Ideally the researchers should be able to search The National Map da- tabase for the spatial and temporal location of each fire event and retrieve temporal versions of the terrain, roads, structures, and land cover most consis- tent with the fire date. Such a scenario illustrates one potential spatiotemporal support role for The National Map by CEGIS that could be addressed in the long term.

Research Priorities 91 Research Question: What can be learned from spatiotemporal use cases for advancing spatiotemporal models for The National Map? (long term) Research Question: How is change effectively represented in spatial data sets? (long term) Research Question: How can process-based models be used to improve data quality or quality awareness in The National Map? (long term) Semantics-Driven Transaction Processing The contribution of data to The National Map and other USGS databases from multiple local data sources and partners has a real benefit in improving the update cycle and easing the burden of centralized data collection. Indeed, dis- tributed, locally based geographic data collection stands to substantially help the USGS maintain current, locally verified, comprehensive databases of geographic information. However, such an approach can create a substantial new burden for transaction processing (insertions, modifications, and metadata management) on these databases. Insertion transactions are likely to become much more fre- quent (e.g., as sensors generate near-continuous data streams), pertain more to individual features, and generate more complex metadata records given that data sources may include many different heterogeneous technologies with potentially quite different accuracy or quality characteristics. Transaction processing also becomes more complex in the more complex data model environments de- scribed above. Revisiting the fire example, let us assume that the firefighters collected in- formation on portable computers or from deployed sensors in the field as they were fighting the fire. At the end of the day, the goal is to distribute this infor- mation as updates to appropriate databases. Suppose the information collected by the firefighters includes estimates and extents of burned areas and an inven- tory of burned structures. Several long-term questions arise on what the transaction processing logic is for updating National Map or other USGS data- bases. The firefighters are not expected to be database experts and so need support for simply uploading the data. There is, however, complex transaction processing logic that stands behind uploading these data to the correct databases. For example the records of burned areas could be added to a fire events data- base. In addition, it may be appropriate to update a set of land cover databases and associated products. The National Map includes several land cover and as- sociated products, and the transaction logic would have to consider whether some or all of these should be subject to fire updates. The transaction logic might be such that only those products in which the resolution or granularity of cover classes matches the extent of the burn area are subject to updates. Coarse land cover products might be immune to small burn area updates. On the other

92 A Research Agenda for GIScience at the USGS end of the granularity spectrum, if the fire data are sufficiently detailed to indi- cate differential burn on different cover types, the transaction logic might be that differential burn damage information is applied to different land cover classes. The information on destroyed structures requires similar sets of transaction deci- sions. Presumably the structures database should be updated with information that structures X, Y, and Z were destroyed by fire on the given date. A follow-on question could be what additional databases and sources should be updated? For example, should any high-resolution images depicting these structures be up- dated? Research Question: What is the transaction processing logic for complex spatiotemporal transactions among National Map and other USGS data- bases? (long term) Research in this area resides predominantly in the database research com- munity. Transaction processing generally is a mature field, but spatial and temporal transaction processing and transaction processing in distributed data- base contexts are still new. The OGC Transaction Web Feature Server is addressing the ability to create, update, and delete geographic features in a dis- tributed computing environment and CEGIS may consider some collaborations with OGC in developing distributed National Map transaction processing. Open research issues remain with respect to spatial transaction processing on mul- tiresolution databases. Kafeza et al. (1996) and Rigaux and Scholl (1995) describe approaches to transaction processing in multiscale and multiresolution environments, and there is relevant work on transaction processing for mobile systems (Hampe and Sester, 2004). Hampe and Intas (2006) have recently pro- posed extensions to OGC Web Feature Services standards to support transactions on multiple representation databases. Transactions in spatiotempo- ral databases must address issues of when or how frequently new versions or states of feature properties are updated. Some update transactions may be event driven as in the case of the fire event described above. CEGIS might investigate what USGS or other agency databases record events (e.g., earthquakes, land- slides, floods) and consider the automation of National Map database update transactions in response to such events. SUMMARY This chapter has laid out a recommended research priority structure for CEGIS, with three priority research areas, each broken down into research topics. The two highest-priority topics are recommended for immediate action in each area, with other important topics described as well for the purposes of longer-term research planning. Specific research questions for each topic are

Research Priorities 93 also suggested as potential starting points. Table 3.1 summarizes this research structure and is organized to show the broad research areas, the recommended research topics within those areas, and the committee’s suggested initial re- search questions. Building on this foundation as resources allow and requirements evolve, CEGIS can expand its research portfolio to address a broader range of key GIScience issues of national relevance.

94 TABLE 3.1 Summary and Time lines of Recommended CEGIS Research Areas, Topics, and Questions Research Area Research Topic Research Questions Time Range (Bold = Priority Topic) Information Access Innovative formats 1. What is the widest range of scales that can be mapped only by Short term and Dissemination and designs to adjusting map symbols combined with selectively removing reinvent topographic feature types? maps in an electronic 2. What is the minimum amount of change to map symbols and Short term environment content that provides the maximum scale range maintaining topographic map usability? 3. What is the stability of topographic map design (with the goal of Short term establishing a coherent set of designs that function from coarse to fine resolutions through scale change)? 4. What should be the visual hierarchies for the base National Map Short term layers? 5. How should USGS select a subset of automated and manual Short term approaches to visual hierarchies to provide a tool that effectively serves the largest number and variety of National Map users seeking to answer geographical questions that are not served by commercial point-to-point navigation tools (e.g., Google Maps, MapQuest, and Yahoo!)? 6. What is the optimal combination of types and number of symbols Short term for an inexperienced user to create an effective topographic map and accommodate a data overlay on a topic of interest using web tools?

User-centered design 1. With the goal of updating and evaluating The National Map Short term for implementation of viewer user interface, (a) what types of user interfaces are The National Map appropriate for The National Map viewers, (b) does The National web services Map need different viewers for different users and map contents or is a single one appropriate, and (c) what kinds of communication methods are effective for disseminating geospatial information through web browsers? 2. Will new web mapping technologies, such as vector-compression Short term algorithms, AJAX, and Adobe Flex, improve the usability and system performance of The National Map servers and general web mapping applications? 3. What is an appropriate standardized user testing and evaluation Short term method for assessing and improving the effectiveness of National Map products? Open Geospatial 1. How should USGS create OGC standard profiles (which are a Short term Consortium (OGC) subset of standard specifications and customized standard content) Standard Profiles for to bring layers in The National Map databases into conformance The National Map web with OGC standards? Short term mapping services and 2. How can USGS overlay well-positioned labels with clear map layer design categories and hierarchies on top of symbolized features dynamically set to foreground and background depending on user 95 interests?

96 Integration of Data Generalization 1. What are the specific new generalization operations and Short term from Multiple algorithms that will be needed for The National Map? Sources 2. What feature-based generalization is needed for The National Map Long term (the focus would be on a specific feature, such as a stream, and approaches needed for stream generalization) and how can that be accomplished? 3. What new kinds of measurements will be needed to determine Short term locational conflicts between USGS features? 4. What are the effective scale ranges for fusing two layers together, Long term and how does generalization affect fusion? Data fusion 1. What are the data quality issues related to spatial data integration Short term and fusion? 2. How can areal interpolation—as a key method for fusing aspatial Long term data with spatial data—be applied in The National Map? Data Models and Geographic feature 1. What are the key sets of topographic features portrayed within The Short term Knowledge ontologies National Map layers that should be explicitly represented in Organization ontologies (these might align with the set of features already Systems identified within the Spatial Data Transfer Standard; USGS, 1994)? 2. What are the formal operational definitions for these features, their Short term parts and structures, and their relationships to other features? 3. What automated feature extraction methods are derivable from Short term these operational definitions?

Ontology driven data 1. How does a geographic feature ontology operationally support a Long term models and National Map feature database? gazetteers 2. How can the collection, validation, modeling, and management of Short term vernacular names be facilitated? 3. How can the creation of more detailed or smart feature footprints Short term be automated? 4. How can the implications of fuzzy footprints in gazetteers be Short term managed? Quality-aware data 1. How can sampling distributions of complex objects be defined and Long term models managed (e.g., reduce them to points in some N-dimensional shape space)? Data models for time 1. What can be learned from spatiotemporal use cases for advancing Long term and change spatiotemporal models for The National Map? 2. How is change effectively represented in spatial data sets? Long term 3. How can process-based models be used to improve data quality or Long term quality awareness in The National Map? Transaction processing 1. What is the transaction processing logic for complex Long term spatiotemporal transactions among National Map and other USGS databases? 97

Next: 4 Realizing USGS's Vision for CEGIS »
A Research Agenda for Geographic Information Science at the United States Geological Survey Get This Book
×
Buy Paperback | $53.00 Buy Ebook | $42.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Comprehensive and authoritative baseline geospatial data content is crucial to the nation and to the U.S. Geological Survey (USGS). The USGS founded its Center of Excellence for Geospatial Information Science (CEGIS) in 2006 to develop and distribute national geospatial data assets in a fast-moving information technology environment. In order to fulfill this mission, the USGS asked the National Research Council to assess current GIScience capabilities at the USGS, identify current and future needs for GIScience capabilities, recommend strategies for strengthening these capabilities and for collaborating with others to maximize research productivity, and make recommendations regarding the most effective research areas for CEGIS to pursue. With an initial focus on improving the capabilities of The National Map, the report recommends three priority research areas for CEGIS: information access and dissemination, data integration, and data models, and further identifies research topics within these areas that CEGIS should pursue. To address these research topics, CEGIS needs a sustainable research management process that involves a portfolio of collaborative research that balances short and long term goals.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!