Spatial Information Retrieval

 

Xuejiao Liu

INF 385F: WIRED

Fall 2004

 

Abstract

Information technologies are creating new ways of meeting the needs of spatial and cartographic information retrieval.  This paper describes the basic concept of spatial information retrieval and introduces one of the most important elements of digital spatial information retrieval 求 Geographic Information Systems (GIS).  This paper also presents a review on the basic technology of GIS and newest development trend of GIS 每 web-based GIS.

1         Introduction

A large proportion of information available on the world-wide web (WWW) refers to objects that may include one or more spatial components.  The information that contains spatial component is a core part of many research area, such as geology, climate, agriculture, and business areas, such as retail industry and global trade (Jones, 2002).

There is a wealth of survey data, images, maps and reports that relate to specific places or regions available in databases and on the internet.  This data set can range from the size of the Earth and larger down to cities, streets, and the human body.  Spatial data mining, or knowledge discovery in spatial databases, is the process of extracting of implicit knowledge, i.e., spatial relations, and finding interesting characteristics and patterns that are not explicitly represented in the databases (Koperski, 1998).  This technique plays an important role in understanding spatial data and in capturing intrinsic relationships between spatial and non-spatial data.

Geographical Information System (GIS) refers to a computer-based system for storing, analyzing, and reporting map and spatial database, providing environmental, social economic and geographic information.  Web GIS combining GIS technique with internet platform is the new development direction of GIS.


2         Spatial data mining and knowledge discovery

Spatial data is defined as location-related data in an object.  A spatial database stores spatial objects and spatial relationships between these objects.  As the development of database technology and widely uses of database management system, data volumes in databases are growing exponentially.  There is an urgent need for a new generation of computational theories and tools to extract implicit information from rapidly growing volumes of data.  The Spatial data mining (SDM), or knowledge discovery in spatial databases, focus on the extraction of implicit spatial knowledge and discovery of interesting characteristics and patterns that are not explicitly represented in the databases.  Spatial data mining can support intelligent spatial decision making and intelligent image process (Koperski, 1998).

2.1      SDM theory and characteristic

The common spatial data mining techniques include sequence analysis, classification, regression analysis, clustering, association, and summarization etc.  We will review these important spatial data mining methods.

Classification analysis

Classification is one of the core classes of techniques in data mining.  The task of spatial classification is to find out a classification function or model to map a data item in database into one of several predefined categorical classes.  Classification analysis can be used to predict extended description from the given data.  A training sample database is needed to build a classification machine.  The methods to build a classification machine include statistical methods, machine learning, and nerval network (Ge, 2003).

Clustering analysis

The task of spatial clustering analysis is to group unclassified data item into one of several classes and describe these classes.  The data in same group is similar and the data in different group is dissimilar.  Clustering analysis is used to find out the function relationship of the attribute of spatial objects and represent them by mathematical functions with parameters.  The common methods of clustering analysis include statistical method, machine learning, nerve network, and object-oriented data method (Raymond, 1994).

Association analysis

Association analysis identifies relationships or affinities among items and features. These relationships are then expressed as a collection of association rules. The approach has been particularly successful in mining very large transaction databases.  Applying association rules on spatial data, we can find the relationships between geographic locations.  For example, 85% of big cities are beside river.  Airports are always located beside high way (Ge, 2003).

2.2      Spatial data management systems

Before GIS technology emerges as the primary land information management system, various types of computer systems have been used for managing information of spatial objects, such as Automated Mapping (AM), Computer-aided Design (CAD), Land Information System (LIS), Automated Mapping and Facilities Management (AM/FM).  Many functions of computer systems have been incorporated into the GIS. Examining the advantages and limitations of those systems is helpful for people to understand the developing trends of GIS technology (Larson, 1996).

3         Geographic Information System (GIS)

Geographical Information System (GIS) refers to a computer-based system for storing, analyzing, and reporting map and spatial database, providing environmental, social economic and geographic information. 

GIS technology is to geographical analysis what the microscope, the telescope, and computers have been to other sciences.  GIS could therefore be the catalyst needed to dissolve the regional-systematic and human- physical dichotomies that have long plagued geography" and other disciplines which use spatial information.  Geographical Information System (GIS) is an integrating technology by linking a number of discrete technologies into a whole that is greater than the sum of its parts.  GIS have emerged as very powerful technologies because they allow geographers to integrate their data and methods in ways that support traditional forms of geographical analysis as well as new types of analysis and modeling.

A regular map shows only spatial data such as cities, rivers, roads, and forests.  A geographic information system contains further information by linking attribute data to spatial data. For example, population and unemployment rate can be linked to specific city areas. This link creates intelligent map features and provides the ability of retrieving the data for fact-finding, querying the data with criteria, spatial analysis and modeling, topological operating, and network analysis (Larson, 1996).

3.1      History of GIS

The definition of geographic information system was brought out in the 50s.  Due to the development of computer science and the use in geographic survey and cartography, people were able to use computer to collect, store and process volumes of spatial and geographical data, and to help the decision making by data analysis results.  In 1956, the department of topography of Austria first built a geographic database in a computer system.  After then, land mapping and management departments of other countries started to develop land information system (LIS) in land management. 

In 1963, a Canadian professional Surveyor R.F. Tomlinson originally used the term of ※geographic information system§, and built the first GIS in the world 每 Canada geographic information system (CGIS) which was used for natural resources management.  After then, many GIS organizations were built, which accelerated the rapid development of GIS knowledge and technology.

In the 70s and 80sGIS technology experienced a significant change due to the development of Personal Computer technology and Database Management System. Rapid growth of GIS system users brings significant requirements for access to geographic data.  GIS software market was becoming more and more popular.  For example Environmental Systems Research Institute (ESRI) released GIS software ARC/INFO in the early 1980s and developed to a $40 million company GIS markets by 1988.  GIS technology was getting to be a very important area studied by government, commercial companies and universities.

In the 90s, as the development of geographic information industry, GIS was used in expansive areas.  GIS software was getting great improvement in the following aspects.  Open resource GIS focused on generate GIS interact and data share standard; Combination of relational database management system with GIS focused on using RDBMS to store GIS data;  In the end of the last century, a fast improvement in Internet and WWW technology and brought new opportunity for distributing geospatial data to a wider range of potential users.  The Web-based GIS reflects this new change.

3.2      GIS components

There are five components in GIS:  hardware, software, data, people and modeling.

Hardware

Hardware is the computer system.  GIS software can run on vary kinds of hardware platforms, from central server to PC, from single machine to network system.

Software

GIS software provides the functions and tools needed to store, analyze, and display information about places.  GIS software is a powerful tool that manages and manipulates geographic information such as addresses or political boundaries and creates intelligent digital maps you can analyze, query for more information, or print for presentation.  GIS software is also a database management system.  Every GIS system uses at least one database.   GIS software provides an easy-to-use graphical user interface (GUI).

Data

A GIS can use data from a wide range of proprietary and standard map and graphics file formats, images, CAD files, spreadsheets, relational databases, and many more sources.  Most Data is free or fee-based and comes from commercial, nonprofit, educational, and governmental sources; other GIS software users and organizations.  GIS data processing includes data input, from maps, aerial photos, satellites, surveys, and other sources; Data storage, retrieval, and query;  Data transformation, analysis, and modeling, including spatial statistics and Data reporting, such as maps, reports, and plans.

Developers and users

GIS technology would be of little value without developers to manage the system and make plan to apply it in practical problems.  The user base of GIS includes experts who design and maintain the system and general users in different areas who use GIS system to complete their daily work.

Analysis and modeling

GIS inspire development of spatial modeling in geography. With the capability of quick data collection, massive storage and retrieval, and dynamic visualization, geographers will have the opportunity to observe continuous "space in time", formulate spatial models, and test their models. Those models can then be fed back to GIS to formulate more complex models, and so on.

3.3      Spatial database

A GIS system stores information as a collection of themed layers that can be used together.  A layer can be anything that contains similar features such as customers, buildings, streets, lakes, or postal codes.  Objects that contains spatial information are represented as several different layers where each layer holds data about a particular kind of feature.

Each feature is linked to a position on the graphical image on a map and a record in an attribute table.

Fig 1 spatial modeling containing a number of layers
(ArcView Spatial Analyst, Enveironmental Systems Research Institute, Inc.)
Data capture

The basic way of data capture is data digitizing and scanning.  There's a new terrain emerging, a terrain made of electronically captured data.  New skills and knowledge are in order to make use of the resources.  With the emerging tools like GPS, Satellite images, public data, information super highway, and WWW, geographers now have more ways of getting at data than ever.  Data accuracy, data sources, and temporal attributes have now become important issues.

Data storage

GIS opened a window to opportunities to describe geographic features in innovative ways.  For example, traditionally, terrain elevation has been described using contours and spot heights.  Digital elevation model (DEM) is a new way for describing elevations.  

Fig. 2 Digital elevation data in GIS
(ArcView 3D Analyst, Enveironmental Systems Research Institute, Inc.)

Spatial data supports software-based and organization-wide standards. The benefit of having software-based data standards is that the program is easier to use, and users can readily move data between systems and platforms.

Data display

GIS provides geographers an opportunity to dynamically present geographic information.  The data with spatial components (table 1) can be displayed in GIS softwares as graphic maps (figure 3).

3.4      An Example of GIS project

The following example shows a GIS project on oil and gas resources in Gulf of Mexico region.  The GIS project contains five layers (themes): wells, oil fields/discoveries, gas fields/discoveries, Federal lease blocks, and US states base map.  This project was build through the use of GIS software ArcView.  The following figure shows how to display data in ArcView.

Fig. 3 GIS project: Gulf of Mexico 

The following spreadsheet table is part of raw data from the database ※wells.dbf§ of wells layer. There are totally 43374 individual wells information in this database. This database was downloaded from Minerals Management Service (MMS) of bureau of the Department of the Interior (http://www.gomr.mms.gov).  The data for layer of Federal lease blocks was downloaded from USGS Coastal and Marine Geology Program's U.S. Gulf of Mexico internet map server (http://coastalmap.marine.usgs.gov/regional/contusa/gomex/gloria/data.html).  The data for layers oil fields/discoveries and gas fields/discoveries was digitized by the author from a paper map ※Discoveries, Fields & Leases, Gulf of Mexico§ provided by BP.  U.S. states map is from the software*s basic map database.

ID

Sur_Longitude

Sur_Latitude

Well_NAME

SPUD_DATE

Total_Dept

Bottom_Block

1

-94.769323

26.954832

 

19980530

20011228

19980625

2

-94.769323

26.954832

ST01BP00

20011228

20020203

20011231

3

-94.700555

26.938640

ST00BP00

19961223

19970122

19970121

4

-94.700555

26.938640

ST00BP01

19970123

19980806

19970130

5

-94.688846

26.938940

 

20000705

20000929

20000810

Table 1 Wells data for Gulf of Mexico

After createing a project in ArcView, we can qyert tge data to get information and solve problems.  A variety of queries can be performed, such as pointing at features on the map to identify them, finding which locations meet certain selection criteria, and analyzing spatial relationships between different phenomena to find out how they might influence each other.   For example, we can build a query expression in AcrView to find wells with ※ID number less than 50 and surface latitude equal to or less than 26.9§.  Building a query expression is a powerful way to select features because an expression can include multiple attributes, operators, and calculations.

Fig 4 query expression in ArcView

The map in this GIS project can be print out with title, scale bar, legend and some other graphics.  Layouts make it easy to prodece presentation quality maps.

Fig 5 Layout map from Gulf of Mexico project

3.5      Spatial analysis in GIS

With a GIS software and related database, we can display, query, and organize data geographically and solve problems by uncovering and analyzing trends and patterns.  The spatial analysis functions in GIS include distance mapping, proximity mapping, weighted-distance mapping, density function, surface functions and local statistical functions etc (ESRI, 1996).

Spatial queries are performed by creating a query set based on the spatial relationship of map features. Spatial operators in the queries define the spatial relationships that exist between map features.  Most spatial operators can be combined to answer complex spatial queries.  For example, a district manager for a chain of banks wants to find good locations for new banks.  The manager is most interested in areas far from the existing banks and with many people living nearby.  He can use GIS to analyze related data in the database and create a map of distance from banks and query the map for information on distance and population to find the best location for a new bank.  The result can be converted to an image file and displayed with other data in a final presentation.

A simple four step modeling process is used to solve spatial information analysis problem.  Most spatial models involve finding optimum locations.  Let*s use an example to present how to model spatial problems in GIS.

Fig. 6 Spatial information retrieval from GIS 
(ArcView Spatial Analyst, Enveironmental Systems Research Institute, Inc.)

1) Stating the problem.  Let*s suppose the problem is to find the best areas for opening a new store.  The seeking result from this study is a map showing areas ranked best to worst as potential areas for a new store.  This is called a ranked suitability map.

2) Breaking down the problem into a series of objectives to be solved.  Once the problem is stated, break it down into smaller and smaller pieces, until we know what data and steps are required to solve it.  The problem of finding the best areas for new store can be break down into the following objectives: ※Where are the good customers?  Are there enough of them?  Are they far enough from existing stores?§  To create a map of good customers, we need to define some characteristics of people who like the product.  Based on the survey data and a data set of store locations and attributes, we select the successful stores, map their trade areas, and combine this with the demographic data to verify that the good customers from the survey and the people living near the successful stores are the same people.  To determine if there are enough customers, we need to create a map of the number of people within the store*s trade area.  If the customer survey shows that most of customers travel less than 3 miles, we need a map showing the number of people within 3 miles of every location in the study area.

3) Assigning value of suitability to the objectives.  After the map layer of each objective is created, it is need to know how the objectives be combined to create a single ranked map of potential store areas.  Assigning numeric value to classes within each map theme is used to compare the value of one class with another.  In this case, each objective can be ranked as how suitable it is as a location for a new store by assigning each objective a value on a scale from 1-10, with 10 being the best.  This is often referred to as a suitability scale.

4) Finding suitable areas.  There weighted grids can be created based on customer map, population density map and distance map, and by the value assigned.  The weighted grids represent the high percentages of good customers, high population density, and distance away from existing stores.  Because the three resulting grids contain the same weighting scheme, they can be combined together to create the ranked suitable areas map.

4         Web GIS

Web GIS is a geographic information system that GIS data and functionality are made available over the Internet.  Web-based GIS, by combining the GIS technology and the Internet technology, can provide an almost unlimited data access to GIS users from around the world.  Web GIS provides a new platform for spatial information and GIS service in a wider range.

Web GIS has three components:  web browsers, web GIS information agency and web GIS servers.  The basic Web GIS technique is not so complex.  On the server side resides the GIS database and applications to process the user*s request.  On the client side is a user interface within a web browser. Whenever a user submits a request, the server processes the request with the GIS application program and returns the result to user*s computer.

Web GIS provide flexible GIS services to the user.  The user can request GIS services and get response through Web-based GIS at any time and possibly at any place.  Web-based GIS is easily to use and low cost.  Users do not need to purchase and install expensive GIS software in order to access and work with maps and databases. Web-based GIS does not require the user to get trained to use it.  Users do not need to become experts in sophisticated GIS applications, since the functionality is made available through a regular web browser and an integrated Viewer with a simple, user-friendly interface. Any one who knows how to use the Internet can use Web-based GIS.  Web-based GIS can distribute GIS data and geo-processing tools to a broader range of potential users that conventional GIS implementation may never reach.

4.1      Web GIS application 每 online mapping

Maps are the main source of data for GIS.  Within mainstream online mapping technology, servers generate maps as pictures in one of the standard raster graphic formats supported by graphical web browsers.  Interactivity is accomplished by delivering an updated map image in response to user requests. Examples of the Internet map server technology include multiple web mapping sites powered by ESRI*s ArcView IMS, MapObjects IMS, or ArcIMS, MapInfo*s MapeXtreme and MapXsite [MapInfo, 2000.

There are some popular online maps services such as ※Yahoo maps (http://maps.yahoo.com)§ that provide maps of U.S. locations using address or intersection or city & state; includes driving instructions, and ※Google local (http://local.google.com)§ provide maps for local businesses and services on the web.

5         Conclusion

Managing large amount of spatial data and find useful information from databases distributed over a wide range is a very important task.  After decades of development in spatial datamining and knowloge discovery, GIS becomes one of the most popular techniques in spatial datamining.  Although there are still certain unsolved problems in Web based GIS applications, it is unquestionably the trend of spatial dataming.  More and more Web GIS systems are expected to appear on the Internet and make our life esier.


References

K. Koperski, J. Han, and J. Adhikary. Mining knowledge in geographical data.  Communications of the ACM, 1998.

(http://citeseer.ist.psu.edu/koperski98mining.html)

Raymond, T. Ng. and J. Han. Efficient and effective clustering methods for spatial data mining. In Proc. of VLDB Conf., pages 144-155, September 1994.

(http://citeseer.ist.psu.edu/95801.html)

Ge, Ji-Ke, The Technology and Methods of Spatial Data Mining, Information College, South West Agricultural University, Chongqing, 2003.

(http://ir.hit.edu.cn/cgi-bin/newbbs/topic.cgi?forum=20&topic=53&show=75)

C.B. Jones, R. Purves, A. Ruas, M. Sanderson, M. Sester, M.J. van Kreveld, and R. Weibel, Spatial information retrieval and geographical ontologies - an overview of the spirit project. In Proc. 25th Annu. Int. Conf. on Research and Development in Information Retrieval, 2002.

(http://europa.eu.int/information_society/istevent/2004/cf/document.cfm?doc_id=531)

Ray R. Larson, Geographic Information Retrieval and Spatial Browsing, GIS and Libraries: Patrons, Maps and Spatial Information, edited by Linda Smith and Myke Gluck, Urbana-Champaign : University of Illinois, 1996. (p. 81-124).

Envirionmental Systems Research Institute, Inc., ArcView GIS, The Geographic Information System for Everyone, ESRI publication, 1996.

Envirionmental Systems Research Institute, Inc., ArcView Spatial Analyst, advanced Spatial Analysis Using Raser and Vector Data, ESRI publication, 1996.

ESRI Virtual Campus 每 GIS Education & Taining on Web (http://campus.esri.com/).