GSoC/GCI Archive
Google Summer of Code 2015


License: GNU Library or "Lesser" General Public License (LGPL)

Web Page:

Mailing List:

OncoBlocks is a new open-source initiative, currently hosted at the Biostatistics and Computational Biology Department of the Dana-Farber Cancer Institute, in Boston, MA.

The goal of the project is to create reusable, open source software components to support cancer genomics research and enable precision (or "personalized") cancer medicine.  

These components can then be used and re-used in multiple research and clinical application contexts, and may also form the basis of new features within the cBio Cancer Genomics Portal (, another open-source project, originally created at Memorial Sloan-Kettering Cancer Center.  For additional background regarding cBio portal and cancer genomics, please see our references at:  Reference 1, Reference 2.  

OncoBlocks is currently in start-up phase, and we are hosting prototype code on github at:  The mentors have a long track record and commitment to open source software, including Cytoscape, a previous participant in GSoC.

Our ideas page is located at:  

How to Get Involved:

Due to high student interest (hooray!), we have set up a number of online basecamp forums where students can post questions and we have posted additional background information.

Register for Basecamp Forum


  • Extraction of Clinical Trial Biomarkers The goal of this project is to build a prototype tool for extracting and reviewing genomic markers from clinical trial records. Complete clinical trial data is available from the National Institutes of Health. Data sets are also available for download in XML format. However, and unfortunately, genomic biomarkers are not specifically enumerated as distinct elements within the XML, and it remains an open question as to how best extract this information.
  • Scalable Data Warehouse and REST API for Cancer Genomics Here I propose to design and implement a prototype data warehouse for storing analyzed cancer genomic data, together with a RESTful web service for retrieving data from the warehouse. Since genomic data tend to be variable and complex, document-oriented databases are well suited for this project. I plan to perform data modeling, and implement this model in MongoDB, SciDB, and Couchbase. Furthermore, I will use Spring to implement a REST API and conduct performance tests on of these databases.
  • Tumor Heterogeneity Tool - Visualising microevolution of cancer Tumor heterogeneity (TH) is the phenomena whereby tumor cells display distinct biological differences despite deriving from the same origin and is link to cancer progression. Genome sequencing have provided in the study of (TH) in cancer. Still, study of these data is hindered by the lack of easy to use visualization tools This project aims to develop tools that combine genomic data from time course experiments, scientific literature and patient data, visualized in a concise way.