GSoC/GCI Archive
Google Summer of Code 2012 Apache Software Foundation

CUBE operation in Pig

by Prasanth Jayachandran for Apache Software Foundation

Computing aggregates over a cube of several dimensions is a common operation in data warehousing. In Online Analytical Processing (OLAP) systems, a cube is a way of organizing data in N-dimensions so as to perform analysis over some measure of interest. Measure is a term used to refer numerical facts that can be algebraic(SUM, COUNT etc.) or holistic (DISTINCT, TOP-K etc.). The aim of this project is to provide support for cube computation over massive datasets using Apache Pig. It extends my current naïve implementation of cube operator to support efficient cube computation for algebraic and holistic measures.