GSoC/GCI Archive
Google Summer of Code 2013 Dept. of Biomedical Informatics, Emory University

Proposal for Hadoop-GIS: Extending Hive with Spatial Queries

by Xiling Sun for Dept. of Biomedical Informatics, Emory University

With the rapid development of positioning technology and imaging technology, the scale of spatial data has shown to have a exponential growth. There are two major challenges for managing and querying these massive spatial data: the multi-dimensional structure of spatial data and the high computational complexity of spatial queries. The rapid growth of spatial data requires a fast query system. In this project, we will present Hadoop-GIS by extending Apache Hive, a MapReduce based data warehousing system with spatial query capabilities. The main goal of Hadoop-GIS is to develop a highly scalable, cost-effective, efficient and expressive integrated spatial query processing system for data and compute intensive spatial applications, that can take advantage of MapReduce running on commodity clusters.