GSoC/GCI Archive
Google Summer of Code 2013

The Centre for Computational Medicine

Web Page:

Mailing List:

The Centre for Computational Medicine (CCM) is a Core Facility within the SickKids Research Institute, providing computational expertise, including High Performance Computing resources, Bioinformatics Analysis consulting and Software Development. At the CCM, we develop free open source software for clinical genetics, which aims to empower scientists to transform their data into knowledge. Our tools have a user base which includes many clinical and research facilities across Canada, the US and Europe. Our target are non-technical users, which we help cross the technological barrier. Our software aims to make their life easier by integrating seamlessly in their work routine and shielding them from technical complexities. From a more technical/development point of view, we focus on usability and accessibility, as well as on big data visualization and interpretation assistance.


  • Extraction of Ontologies terms from free form clinical notes Phenotips is a web based software tool that provides doctors/clinicians easy to use interface for maintaining and analyzing patient data in an efficient way. PhenoTips right now provides an auto-suggest feature whereby doctor/clinician can search for the respective Phenotype. This is based on Apache Lucene/Solr based back-end search system. However Human Phenotype Ontology has approximately 10,000 terms associated and it is not possible for any user to remember every single one. Therefore through this project I propose to build a system which will facilitate the migration of patient records into Phenotips by automatically extracting Human Phenotype Ontology terms from the free form clinical text. This will not only be a user friendly arrangement but will also help him/her save a lot of time in the process.
  • Parallelization of Queries in MedSavant MedSavant is a search engine for genomic variants. It is a server-client application with the server mainly dedicated to fetching results via Infobright Community Edition, a specialized, SQL-based, single-threaded database. A query on a huge dataset can be optimized by dividing the dataset into pieces called shards and assembling the results. This project aims to allow such parallelization of queries in MedSavant.
  • Solr Backend for MedSavant The project consists in creating a data access interface for Medsavant using the Solr search server and compare the performance against the current Infobright backend.