GSoC/GCI Archive
Google Summer of Code 2015

Biomedical Informatics, Emory University

License: Apache License, 2.0

Web Page:

Mailing List:

Biomedical Informatics (BMI) is a multidisciplinary field that studies and pursues the effective use of biomedical and clinical data, through novel computational approaches, driven by efforts to improve diagnosis, clinical care, and human health. The BMI department is working in the area of Big Data Analytics for Healthcare. We use our expertise in computer science and informatics by developing various enabling tools, technologies, and algorithms to solve specific biomedical and clinical applications. And in doing so help advance our understanding of disease and treatment, and also develop useful software and applications. Members of the department work in a variety of areas that range from machine learning, healthcare middleware that levrages cloud computing, clinical information systems, clinically oriented image analysis and biomedical knowledge modeling. The driving applications for the various ongoing projects include cancer research, organ transplant, HIV, medical imaging, radiation therapy, and clinical data analytics. All development work that is undertaken is free and open-source.


  • Bulk Data Transfer The Cancer Imaging Archive (TCIA) currently uses a Java WebStart solution which is to be replaced by a web based version. This solution should have functionalities like parallel streaming, tracking download, data consistency validation and has to be able to recover from failure and interruptions. Link to the Google Doc Proposal version is mentioned in the "Additional Info URL" field.
  • Data Federation / Integration Tools Medical image archives consists of heterogeneous data sources. Multiple data sources and databases store the medical images, which need to be federated and integrated for more complex use case scenarios. The data consumer should be able to federate interesting data across diverse data sets without knowing much about the information model of the different data sources. Future work identified in the last GSoC should be implemented and the platform should be integrated with the current work.
  • Integrating CNN with an Interactive Machine Learning System This project aims to improve the performance of the current Convolutional Neural Network being used in the prototype, by basically using three approaches: testing di fferent possible features other than the current ones, implementing a new model to include the unlabeled data, fine tuning and optimising the new model.
  • TCIA Data Exploration and Information Visualization Our goal is to build a generic data explorer that allows visualization designers to author visualization dashboards. One of the biggest challenges is to pool data from different remote archives, integratively. We propose a client server architecture, using mongo for filtering to handle big datasets. We’ll be using d3.js/dc.js on the client side with brushing-and-linking-like features to perform interactive visual analysis. Subsequently we use the data explorer to author a dashboard for TCIA.