GSoC/GCI Archive
Google Summer of Code 2013

Apache Software Foundation

Web Page:

Mailing List:

Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including Citrix, Facebook, Google, Yahoo!, Microsoft, AMD, Basis Technology, Cloudera, Go Daddy, Hortonworks, HP, Huawei, InMotion Hosting, IBM, Matt Mullenweg, PSW GROUP, SpringSource/VMWare, and WANDisco.

Our ideas page can be filtered by the labels documented at


  • #TAJO-34 - Outer Join Official project description: The current Tajo does not support outer join. The parser part for outer join was already implemented. In this issue, we should improve LogicalPlanner/LogicalOptimizer to deal with outer joins. Then, we should adopt outer join to existing physical join operators like HashJoinExec and MergeJoinExec.
  • A better plan/data flow visualizer As an improvement for the visualization, it is proposed a change to the visualization. On which, it is required to display a script on the different layers (flow, logical, physical and mapreduce), show operator type and alias, and so on, more dynamically through some graphical tools, such a D3JS. Based on the requirements, I proposed an extension to the grunt, which generates a zip file with the graphs required (flow, logical, physical and/or mapreduce) by using D3JS.
  • A generic (Naked Objects) Android app, to run against Isis' Restful Objects interface Apache Isis a java framework that support to build domain-driven applications. Users are expected to focus in domain entities and application logic then UI representation is generated from the framework itself. Isis is capable to generate user interfaces at runtime. It is already capable to generate such user interfaces using wicket, servlet/jasp and Restful api over http and json. In this project it is expected to adopt this generic user interface designing capabilities to android applications using existing Restful Objects Interface. At the end of this projects the outcome should be a generic android app that can generate user interfaces dynamically for any given domain model design using the support of Restful Objects Interface of Isis
  • A new modular UI for Apache CloudStack The aim of this project is to create a new modular UI for Apache CloudStack using Bootstrap by Twitter and Backbone.js. To achieve this easily, I'll be creating a RESTful wrapper API on top of the current CloudStack API. I hope this project will make custom UIs for CloudStack very easy.
  • A Web-based Workflow Composition UI for Apache Airavata Apache Airavata allows the registration of applications in a global registry and then composing these applications to form a workflow. This is done in a Java based workflow composition service called Xbaya: which allows application registration, workflow composition, workflow execution and monitoring. The project proposed here, on completion , looks to replace application registration and workflow composition UI provided by Airavata with one built with HTML5, JavaScript and CSS3.
  • Add Xen/XCP support for GRE SDN controller This project aims to enhance the current native SDN controller in supporting Xen/XCP and integrate successfully the open source SDN controller (FloodLight) driving Open vSwitch through its interfaces.
  • Agent Based Modeling based geo-profiling of criminology projects The goal of this project is to extend the geo-profiling computational criminology projects. Previous work in this field uses a simple 2 dimensional spacial grid with x, y co-ordinates to move the agents. This project instead uses google maps api to make the agents move around as we validate various criminology theories like Distance Decay and Routine Activity. In the future these could be modeling in SIS when rendering is fully supported.
  • AMQP Messaging protocol support for Airavata WS-Messenger WS-Messenger is a publish/subscribe type message broker implementation that is based on Web services. AMQP is an application layer protocol for message-oriented middleware that also supports publish/subscribe and store-and-forward message flow patterns. The objective of this project is to implement a robust, scalable and efficient MOM framework in WS-Messenger based on AMQP.
  • Apache Airavata Gateway Monitoring Dashboard Apache Airavata is a framework for managing computational jobs and workflows. An essential part of managing a workflow would be debugging and this project would provide a user friendly light weight webapp in HTML5 and JavaScript with simple visualizations and tools for the administrators to monitors the workflows. This would be an independent module and can be easily attached to their existing webapp or can work as a standalone webapp as well.
  • Apache Cassandra backend for Sling Apache Sling is a web framework that uses a Java Content Repository as its underline repository which is Apache Jackrabbit, to store and manage content. Sling applications use either scripts or Java servlets, selected based on simple name conventions, to process HTTP requests in a RESTful way. Sling is the ideal way for you to implement a web content management system where it’s simple to implement simple applications, while providing an enterprise-level framework for more complex applications.
  • Apache Celix Event admin implementation [GSoC] Event Admin Celix-48 I'm Erik, a computer science student from the Netherlands, I'm applying for a spot in Google summer of code to develop the Event admin specification of OSGi in Apache Celix, an incubator project. I want to help Celix grow in terms of functionality as well as in community.
  • Apache Gora support for Oracle NoSQL datastore Further expanding Apache Gora’s datastore support is key for becoming a standard persistence framework. The goal of this project is to extend the integration capabilities of Apache Gora. By implementing a new module for Gora, named Gora-OracleNoSQL, this project aims to offer a new NoSQL datastore which will enable Apache Gora users and developers to use the functionality of the enterprise-class Oracle NoSQL database and vice versa.
  • Apache OODT: Upgrade the CAS-Product Web Application to use JAX-RS via Apache CXF Apache OODT has a component called File Manager that stores files and associated metadata. The CAS-Product web application, another component of OODT, provides a RESTful interface for accessing data from the File Manager. The goal of this project is to use JAX-RS to implement a more powerful and extensible RESTful service for the CAS-Product web application.
  • Benchmark and Framework for Parallel XQuery VXQuery has create a parallel XQuery processor, yet there is no benchmark to show how well it performs. In order to show the power of a parallel implementation XQuery, a benchmark and framework shall be set up using a a set of queries to test the parallel execution. The set of queries shall outline the strengths and weaknesses of using VXQuery's implementation. The framework should return measurement details for determining performance.
  • Bloodhound: Embeddable tickets/objects Built on top of Trac, Apache Bloodhound is a software project management application recently released on the market. The main features of the application include multiple projects management, ease of installation and a user-friendly interface. It reduces the effort of tracking projects by providing a clean interface to connect revision control with wiki content and a bug database. Being able to reference tickets or other objects in a clear, nice manner would a great feature for the application, as many users share content on external websites. Posting a plain link requires other users to follow it individually, which isn’t very engaging for them. Also, posting the static content of the objects is not a good solution as the states of the object change very often, and the description becomes outdated.
  • Cloudstack: LDAP user provisioning The aim of this project is to provide an more effective mechanism to provision users from LDAP into cloudstack.  Currently cloudstack enables LDAP authentication.  In this authentication users must be first setup in cloudstack.  Once the user is setup in cloudstack they can authenticate using their ldap username and password.  This project will improve Cloudstack LDAP integration by enabling users be setup automatically using their LDAP credentials. 
  • CMIS(Content Management Interoperability Services) UCP(Univeral Content Provider) for Apache OpenOffice using Apache Chemistry Universal Content Providers internally provide access to various data sources. This project involves coding a new UCP(Java UNO Component) to provide access to files stored in file system of OASIS standard Content Management System using the Apache CheMIStry(OpenCMIS) client libraries. Also a new sidebar panel to support the functions implemented from CMIS is to be created.
  • Create an Email connector for Apache ManifoldCF Apache ManifoldCF is a framework which facilitate connecting source content repositories to target repositories so that indexing and searching of content can be done. There are different types of content in different repositories. ManifoldCF use repository connectors to connect these different types of content repositories. But yet ManifoldCF does not have a email connector. But emails carry a great importance in present enterprise. So it will be very elegant to have a email connector for ManifoldCF. This proposal is to implement a Email connector for ManifoldCF framework.
  • FOAF Co-reference Based Entity Disambiguation Engine In Apache Stanbol The proposed project mainly focuses on developing an 'Entity Disambiguation Engine' in Apache Stanbol by computing co-referent relations in friend-of-a-friend (FOAF) data-sets. The same entity (persons, organizations) can be referred by different names and vice-versa on the web which leads to the 'named ambiguity' problem of entities. This problem can affect the accuracy and relevance of results inferred by semantic engines and leads to the requirement of using effective disambiguation techniques to process entities as part of the enhancement process in the semantic engines. This proposal focuses on using FOAF profiles as a datasource and process them to resolve name ambiguity problem in an effective way. FOAF is a vocabulary used to describe people, organizations and groups in the form of linked data to form an entity network on the web. The relationship of these FOAF instances can be very useful to derive new knowledge about entities using semantic techniques. Some of the FOAF instances may denote the same entity in different approaches with different information. Therefore it's essential to identify which FOAF instances are describing the same entity over the web and identify the co-reference relationship between them. The co-reference analysis can use FOAF attributes such as mbox, homepage, weblog, as unique identifiers (inverse functional properties) to match FOAF instances to identify co-referent clusters and use it to disambiguate entities over the web. This project aims to develop a comprehensive disambiguation algorithm by identifying and clustering co-referent FOAF instances which describes the same entity over the web. Furthermore, integration of FOAF profile support in Stanbol can be useful to develop a user network in content management systems and improve access control of content using social graph techniques.
  • Freebase Entity Disambiguation in Apache Stanbol The main goal of the present proposal is to develop a disambiguation engine for the Open-Source project Apache Stanbol using Freebase as Knowledge Base. Apache Stanbol provides a set of reusable components for Semantic Content Management. One of such component is a Content Enhancer, which can be used to extract concepts and entities from texts and link them with any Knowledge Base registered in Stanbol. This GSoC project would contribute with all developments necessary to fully support Freebase in Stanbol including disambiguation engines for this Knowledge Base.
  • Generic Naked Objects app in JavaScript for Apache ISIS This project is a result of my brainstorming over the idea given on Apache Software Foundation’s ISIS project’s GSoC Ideas page[4]. Dan Haywood has developed a jQuery mobile demo for a Restful Objects viewer. It renders Restful Objects in a mobile app/website kind of UI. As part of this project, I am supposed to make it more generic. I liked this idea and immediately took up the job of finding a solution for the same. My proposed way of solving this problem is to implement a generic viewer using Javascript. A user who wants to develop an app, should just provide server details and we will render the UI for him. For the same, I would use jQuery, jQuery Mobile, Phonegap so that these can be ported into multiple mobile platforms. Further details about plan of implementation are explained in further sections.
  • Giraph integration with Tinkerpop Graphdb are DBMS providing efficient query processing while Giraph relates to efficient off-line queries which can require several hours to complete. This difference is often not well understood. For this reason, my aim is to provide an API layer based on Tinkerpop to allow the injection of graph databases into Giraph to run analytics which would not be possible in a graph DBMS.
  • Implementation of a Semi-Clustering Algorithm In Apache Hama Semi-Clustering is applied in cases like,Vertices in a social graph typically represent people, and edges represent connections between them. Edges may be based on explicit actions or may be inferred from people’s behaviour . Edges may have weights, to represent the interactions frequency or strength .A semi-cluster in a social graph is a group of people who interact frequently with each other and less frequently with others .What distinguishes it from ordinary clustering is that, a vertex may belong to more than one semi-cluster. Apache Hama Graph API is a good way of applying semi-clustering algorithm on data stored in Hadoop Distributed File System. A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately.
  • Implementation of Online Collaborative Filtering on top of Hama This proposal about implementing Online Collaborative Filtering described in #HAMA-612 ticket
  • Implementing a HTML5 Whiteboard for Apache OpenMeetings This particular project focuses to build the Whiteboard component of Apache OpenMeetings using HTML5 and Apache Wickets.
  • Implementing Hybrid Hash Join Operator in TAJO This proposal aims at implementing the well-known hybrid hash join algorithm in TAJO. This algorithm is a refinement of the grace hash join which takes advantage of more available memory. Therefore, end users should experience performance gains with the implementation of the hybrid hash join operator. The end results of this work include the hybrid hash join physical operator, unit tests, as well as experimental evaluation and performance analysis.
  • Improve Derby's Code Coverage Apache Derby is a relational database management system implemented entirely in Java that can be embedded in Java programs and used for online transaction processing. Derby uses EMMA tool to measure code coverage. According to EMMA, Many packages have code coverage below the acceptable level.This project aims to find out classes with poor code coverage and create tests to cover them.
  • Improve REST support for Axis2/C This project aims to add JSON support and multipart form-data support to Axis2/C, while improving the overall REST support for Axis2/C.
  • Improving CloudStack Support for Apache Whirr and Incubator-provisionr in Hadoop Provisioning The biggest challenge for hadoop provisioning is automatically configuring each instance at launch time based on what it is supposed to do, a process known as contextualization.On EC2 contextualization is done via passing variables through the EC2_USER_DATA entry. Apache Whirr and Provisionr embrace this feature to provision hadoop instances on EC2. This project aims to extend Whirr and Provisionr’s one-click solution on EC2 to CloudStack and also improve CloudStack’s support for Whirr and Provisionr to enable hadoop provisioning on CloudStack based clouds. In addition I will build a Query API that is compatible with Amazon Elastic MapReduce (EMR) to expose this functionality so that users can reuse clients that are written for EMR to create and manage hadoop clusters on CloudStack.
  • Incremental Graph Handling The purpose of this proposal is to extend the Graph API of the Apache Hama Project in a more dynamic way, by adding the feature of vertex addition and deletion during a superstep.
  • Introducing a JSON interface to Airavata Client side and Registry component Apache Airavata uses XML as its data format. However with the proposed web based graphical client UI in the Airavata GSoC 2013 master project, it is always good to have an interface to work with JSON data format.To do this we will introduce a standard way to convert JSON messages to XML before handing over to back end componentst and convert XML messages to JSON messages before sending it to Registry and web client UI.
  • James Administration Console James is a complete and extensible enterprise ready mail server solution. Currently, it has to be configured through manual edition of its many XML configurations files, or through a JMX client, a rather unfriendly configuration approach. A web management interface would let the administrator manage the configuration, management and monitoring options in a much user friendly solution.
  • jena-spatial: Simple Spatial Query with SPARQL In this project, I will develop an extension to Apache Jena ARQ, called jena-spatial, which combines SPARQL and simple spatial query. It gives applications the ability to perform simple spatial searches within SPARQL queries. Lucene spatial can be used for the spatial data indexing and searching.
  • LUCENE-3069 Lucene should have an entirely memory resident term dictionary I aim to improve codec performance by implementing a memory resident term dictionary. The proposed work includes generalization/refactoring of the backend design of PostingsConsumer/Producer, implementation of a pluggable and memory resident term dictionary module, and optimization. I'll also try to improve related codes on oal.util.FST, and introduce a real data set to further experiments.
  • New Spring integration for Apache Axis2 Axis2 is a popular web service framework but it has limited support for Spring framework.This proposal is targeting to write a better Axis2-Spring framework and will support to configure Axis2 framework using Spring ,deploy Axis2 web services including JAX-WS and Axis2 modules. Further this new module can be used as a standalone application or within a Servlet container.
  • OODT-219 [1] - Monitor that plugs into ganglia This project will contribute to Apache OODT Catalog and Archive Service (CAS) Resource Management component, a resource monitor plugin that reads information from Ganglia [2]. </br>This plugin serves following purposes <ul> <li>Collecting resource nodes’ status data on demand</li> <li>Inject custom metrics upon need</li> </ul> </br> [1] </br> [2]
  • Refactor Apache Rat Core to a Classic Object Oriented Design The core code for Apache Rat has difficulties which lead to a high bar for contributions: * based on an experimental streaming architecture * hard to understand * poorly covered by edge-to-edge tests Replace this by a conventional object-oriented design with clear model based on the domain.
  • Versioning of Synapse Configuration Artifacts Enabling versioning strategy for synapse configuration artifacts like sequences, proxy services, APIs, endpoints etc.
  • VXQuery integration with Apache Lucene This project will have two parts. 1. Design and implement the ability for users to create and manage text indices on collections of XML documents. 2. Implement functions in VXQuery to exploit these text-indices to execute relevant queries efficiently.
  • Web Based Workflow Monitoring Tool The Workflow Monitoring tool is currently part of XBaya graphical user interface. Here we propose a new web based monitoring tool , which will be built as a separate module using HTML5, CSS3 and Javascript. We also propose new features in addition to the existing features for the new Monitoring tool.
  • Web-based Workflow Execution Interface for Apache Airavata Apache airavata allows the execution of the workflow composed from the registered applications.Currently this is done by java based XBaya Workflow suite which includes the GUI for workflow composer,execution and monitoring.
  • Wider spectrum of data sources for Apache Giraph using Apache Gora Apache Giraph is a graph-processing framework which can be used as regular Hadoop jobs in order to leverage existing Hadoop infrastructure. Giraph has been built taking into consideration the Pregel paper[1] but adds fault-tolerance to the coordinator process using Apache ZooKeeper as its centralized coordination service. It uses the bulk-synchronous parallel model relative to graphs in which vertices send messages to other vertices in a given superstep. In this manner, Apache Gora could provide a new vertex input format for Giraph and help Giraph provide a wider spectrum of data sources where graph processing could be done and stored. As Gora provides access to different data stores, the best configuration parameters for each one of them should be tested in a graph-processing framework. This could be done by testing specific parameters in well known algorithms implemented in Giraph e.g. PageRank, shared connections, or others.
  • Xalan-J - Complete Support for StAXSource / StAXResult In the context of the global program Google Summer of Code 2013, this document presents a proposal for the project Xalan-J from Apache Software Foundation. The main goal of this proposal is to implement support for the StAXSource / StAXResult interfaces, introduced in JAXP 1.4.
  • XPath1.0 Implementation On Top of XMLStream XPath is used to navigate through elements and attributes in an XML document .XPath can be represented by XPath expressions and XPath engine is capable of parsing and evaluating the XPath over a Context. Jaxen and AxiomXPath are highly used XPath Engines. Currently available XPath Engines consume Object Model.Due to Construction of Object model and navigation on top of a Model requires more memory and reduces performance on the other hand it consumes whole XML Stream when building Object Model and that problem is addressed here. Since evaluation speed is more important in XPath processing, Implementing Xpath Engine on top of the XMLStream rather than on Object model will be a best solution for achieve high performance. So this project is proposed to implement Xpath1.0 specification on top of the XMLStream .