GSoC/GCI Archive
Google Summer of Code 2011

Apache Software Foundation

Web Page:

Mailing List:

The Apache Software Foundation provides organizational, legal, and financial support for a broad range of open source software projects. The Foundation provides an established framework for intellectual property and financial contributions that simultaneously limits contributors potential legal exposure. Through a collaborative and meritocratic development process, Apache projects deliver enterprise-grade, freely available software products that attract large communities of users. The pragmatic Apache License makes it easy for all users, commercial and individual, to deploy Apache products.


  • [HAMA-359] Development of Shortest Path Finding Algorithm My Scenario is the following: Use of HBase to hold an adjacency list (or matrix) of graph nodes and weights. Then implement a Dijkstra that is distributed. Look at the additional info link to get a better formatting. (PDF)
  • [XERCESJ-1429] [GSoC]: Asynchronous LSParser and parseWithContext Apache Xerces2-J is a high performance and fully compliant XML parser written in Java to parse, validate and manipulate XML documents. The goal of this project is to complete the implementation of the DOM Level 3 LSParser. It has to focus on two areas that are still to be implemented according to w3c recommendation. Namely: Asynchronous version of LSParser and parseWithContext().
  • Add JMX management capabilities to Apache Tuscany The goal of this project is to add JMX management and monitoring capabilities to the Apache Tuscany runtime. Started by adding some basic monitoring to view running Tuscany nodes from a JMX agent and then incrementally enhance the support to include management and monitoring of runtime parts.
  • Adding face recognition functionality to Photark Apache PhotArk is an open source photo gallery application which is supplemented by many features. So for a such photo gallery application, the face recognition functionality will be another pretty handy key feature for PhotArk where this face recognition engine can be used to enhance and extend the scope of current gallery features in PhotArk.
  • Adding Social Features to Apache PhotArk The web is becoming more social and adding social features to Apache PhotArk would be ideal to cater for users who require such features and make the application more popular. The goal of this project is to build a back-end API in PhotArk to support for social features conforming to the OpenSocial specification, providing user interface components to enable social features and integrating OpenSocial API using reference implementations like Apache Shindig.
  • Adding xml:id support to Apache Xerces2 Apache Xerces2 is a high-performance, standard complaint processor written in Java for parsing, validating, serializing and manipulating XML documents. And xml:id provides a formal convention for attributes expressing unique identifiers for elements in XML documents. The objective of this project is to provide xml:id support to the Xerces2 by implementing a xml:id processing module.
  • Apache Synapse - Automation Framework for Synapse Samples Synapse provides a large number of samples to help developers get started. Currently these samples have to be manually set up and run which is a tedious and time consuming process. Aim of this project is to develop an automation framework which can handle all the tasks involving trying out the samples enabling new users to get started quickly.
  • Baum-Welch Algorithm on Map-Reduce for Parallel Hidden Markov Model Training. This project proposes to extend Mahout's sequential implementation of the Baum-Welch algorithm to a parallel, distributed version using the Map-Reduce programming framework to allow Hidden Markov Model training at a large scale for enhanced model fitting.
  • Bndtools based OSGi bundles maker project We want to build a bndtools based OSGi bunlles maker project, it will help us analyse java application and split the whole project into several OSGi bundles.This tool can analyse source code, supply vari-size grained split and refactor suggestions, show the analyse result in a GUI view and we can change split solution manually, then it will split the project into several OSGi bundles/projects.
  • Code/Test Separation and add pure webdriver integration for OFBiz I am Ganath Rathnayaka a student of University of Peradeniya, Sri Lanka. Apache OFBiz is a enterprise automation software which used by lots of companies for achieving their goals easily. In OFBiz code the source code and the functional tests are in the same package makes the source code more complected. My project is to separate the source code and the test to different packages and integrate a webdriver to add functional testing in OFBiz.
  • Create an Eclipse plugin that will help to synchronize work between a Java project in Eclipse and Cayenne Modeler Apache Cayenne is an open source persistent framework which is providing object-relational mapping (ORM) and remoting services.Idea of this proposal is to create an Eclipse plugin in order to integrate Apache Cayenne Modeler and Eclipse IDE.
  • Cross-site request forgery protection for Apache Tapestry Tapestry is a component oriented framework for creating dynamic, robust, highly scalable web applications in Java that lacks a built-in mechanism to protect web applications against cross-site request forgery[1]. The goal of this project is to create a Tapestry built-in protection mechanism that secures Tapestry applications against CSRF attacks. [1] The Open Web Application security Project -
  • Derby Test and Fix Derby Test and Fix primarily entails of providing framework for converting store recovery harness tests to JUnit tests as well as the platform to convert the existing store recovery harness tests to JUnit tests. The project also entails of fixing the existing bugs in Derby by analyzing the cause behind the failures.
  • Derby Test and Fix It's valuable to replace existing logic in the Derby test harness with JUnit code and to fix existing bugs. With the new JUnit tests, Derby would gain all the benefits of JUnit, such as running tests from ant, integration with IDEs, ability to hook into other JUnit suites, easier understanding of how Derby tests are run etc. I will focus on 'Make testcases pass in non-English locale', 'Convert remaining store SQL tests into JUnit tests' and 'Converting ij[1-4].sqls to ScripTestCase'.
  • Design and implement a distributed mailbox using Hadoop The mailbox subproject ( supports maildir, SQL database (via JPA) and Java Content Repository (JCR) as technology for mail storage. The goal for this project is to implement mailbox storage as a distributed system on top of Hadoop HDFS using James mailbox API.
  • Develop a 'NoSQL' Datastore component for Apache Cassandra, CouchDB, Hadoop/Hbase I am Eranda Sooriyabandara, student of Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. Apache Tuscany provides a comprehensive infrastructure to simplify the task of developing and managing Service Oriented Architecture (SOA) solutions based on Service Component Architecture (SCA) standard. In this project my ultimate goal is to create a SCA portable data store component/s over number of 'NoSQL' databases like Apache Cassandra, and CouchDB.
  • Develop a simple tool that can be used to generate composite diagrams Apache Tuscany provides a comprehensive infrastructure to simplify the task of developing and managing Service Oriented Architecture (SOA) solutions based on Service Component Architecture (SCA) standard. Tuscany Java SCA is a lightweight runtime that is designed to run standalone or provisioned to different host environments. Task is to implement a tool which generates composite diagrams from the composite files to illustrate the SCA artifacts and their wirings.
  • Develop fluent API / facade to HttpClient Develop the fluent API to HttpClient based on code currently maintained by Apache Stanbol and Apache Sling projects to create a more convenient way to use the HttpClient package.
  • Develop websocket binding for Apache Tuscany The goal of this project is to enable SCA components to expose services that will allow browser clients to communicate with them as well as to enable inter-component communication via the websocket protocol. The nature of the protocol will offer new potential in the asynchronous communication offered by Apache Tuscany and will align the framework with the HTML5 technologies.
  • Eclipse WTP based Tapestry visual editor project Tapestry was used widely nowadays, it allowed a clean separation between Java and HTML, and made it possible for the design work on the application to continue well after the code had been completed, It is becoming more and more popular today. But we can not find a proper Tapestry visual editor, Eclipse WTP is popular in Web application development, but it is a shame that Eclipse WTP does not support Tapestry, so i think it is a good idea to build a Tapestry visual editor on Eclipse WTP.
  • Enable Lucene to take advantage of low-level IO options (direct IO) and generalize it’s Directory implementation My project aims to generalize the current Lucene Directory implementation by making it a UnixDirectory. This would be done by adding IOContext to the lower level API. These are two existing Lucene tasks ( LUCENE-2793 and LUCENE-2795).
  • HTML5 Mobile Template for Apache Roller Roller is a full-featured, multi-user and group-blog server suitable for blog sites large and small. It has features such as multi-user and group blogging, comment moderation, spam prevention, RSS 2.0 and Atom 1.0 support. This project will modify the Roller blog rendering system to support mobile-ready blog theme templates, and providing one example theme that uses these features in combination with HTML5 to show full-sized pages to desktop browser users and small format pages to mobile users.
  • Implement a OASIS style XML Catalog for Apache Woden and resolve existing issues related to URI/Entry resolving mechanism. Apache Woden is an open source framework that implements the W3C WSDL 2.0 specification which is used for WSDL 2.0 manipulation in popular frameworks such as Apache Axis2. OASIS XML catalog defines a standard catalog format for XML URI and Entities and it also defines a resolver mechanism. . The main objective of this proposal is to replace Woden’s simple URI resolving mechanism by OASIS XML catalog and resolving existing issues reported by users of Woden and Axis2.
  • Implementation of Nested Cross for Pig Latin I am a graduate student from National University of Singapore, and am a fan of both open source and big data. This motivates me to apply GSoC'11 with Apache Pig, an OLAP system based on Hadoop. Currently, Pig Latin does not support nested "cross" statement inside "foreach", which can be a useful feature. One typical use case is flattening the records of the "cogroup" of two relations for each enumerated item in the nested block. This application details my plan to implement the nested cross.
  • Key-value data service component for Apache Nuvem. Apache Nuvem defines an open API which abstracts common cloud platform services to help decouple the application logic from the particulars of a specific proprietary cloud. Highly scalable and distributed structured key-value data services are essential features for todays enterprise applications. The key deliverable of this proposal is to Implement a key-value data service component that abstracts cloud data services and standalone data services.
  • LUCENE-1768: NumericRange support for new query parser Apache Lucene supports indexing and searching for numeric types. This allows Lucene to support faster range queries, since building the field cache is much faster than using text-only numbers. One of the big limits today is the lack of support for numeric range queries in Lucene contrib query parser, which still only supports text range queries. This project proposes to implement numeric support in contrib query parser.
  • LUCENE-2308 Separately specify a field's type Goal of this project is to refactor the Field Lucene API by introducing new FieldType class to separate Fields values from their properties and open way for easier Field extensions. This will result in more understandable instantiation of similar fields across documents. Field class, as part of core API, is very sensitive to shallow design or implementation which can cause drastic performance degradation due to its massive usage all over Lucene and Solr project, making this a challenging task.
  • LUCENE-2959: Implementing State of the Art Ranking for Lucene Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specifically to VSM, which makes the addition of new ranking functions a non-trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions.
  • Lucene-2979: Simplify configuration API of contrib Query Parser Lucene contrib query parser has a configuration API that was inherited from token stream API, which uses AttributeSource and Attributes to share token information across token filters. However, the use of this Attribute API in contrib query parser makes configuration much more complex than it needs to be. This project proposes to simplify this API to something much simpler, using a map data structure instead of the complex Attribute API.
  • Manila The main goal is to provide a testing framework which allows both server-side and client-side assertions as well the posibility to run the test with different configurations. Manila is the implementation of the automated tests for webapplication as described in
  • Parallel Viterbi algorithm for Hidden Markov Model The Viterbi Algorithm is an evaluating algorithm for Hidden Markov Model. It estimates the most likely sequence of hidden states called "Viterbi path" from given sequence of observed states and is also used in Viterbi training. This project is intended bring parallel evaluating functionality for HMM in Mahout and also investigates how such dynamic algorithms could be implemented in MapReduce paradigm.
  • Right Click Menu, grid enhancements and two optional components for the Apache Tapestry5 java web application framework This project consistst of two parts. The first part will produce a generic and powerfull Right Click Menu (also known as Context Menu) component for Tapestry5. The Right Click Menu component is meant to be highly configurable ready for enterprise scenarios. The second part will provide enhancements for the existing powerful t5 grid component: bookmarkable URLs with sort parameters of the grid, page number and number of items per page. Also two optional tasks are proposed.
  • Seam-Forge plugin for MyFaces CODI The goal of this project is to implement Seam-Forge plugins for generating CODI artifacts as well as simple application-templates which are using features provided by CODI.
  • Sugar for Pig Syntactic sugar features to simplify the use of Pig. -- Variable argument for SAMPLE and LIMIT. Currently, SAMPLE and LIMIT only take a constant argument. It would be better to be able to use a variable (scalar) in the place of a constant. -- Default SPLIT destination. SPLIT partitions a relation into two or more relations. It would be useful to have a default destination for tuples that are not assigned to any other relation, in a fashion similar to a switch/case/default statement.
  • Support to 2 level nested foreach Pig currently supports DISTINCT, FILTER, LIMIT, and ORDER BY inside nested foreach statement and it is highly desired to have support for FOREACH nested inside a foreach.