GSoC/GCI Archive
Google Summer of Code 2013

Crowdsourcing Biology at the Scripps Research Institute

Web Page:

Mailing List:!forum/crowdbio

Our laboratory focuses on applying the tools of bioinformatics (the intersection of computer science, biology, and statistics) to biomedical discovery. We are part of the the Scripps Research Institute in La Jolla, California, one of the largest non-profit research organizations devoted to biomedical research. Our GSoC application and ideas focus on synergies between crowdsourcing and biological research. Our team develops new technologies and tools to enable large communities of scientists to collaboratively address biological challenges of massive scale. We operate at the exciting intersection between biological and computational research. Our flagship projects include: - The Gene Wiki, an initiative to create a continuously-updated, community-reviewed, and collaboratively-written review article for every human gene. The Gene Wiki receives over 50 million pageviews per year by both scientists and the general public. - BioGPS, a community-extensible gene annotation portal for aggregating and integrating gene-centric knowledge. This resource is more focused on active researchers, and it receives approximately two million pageviews per year. Through these efforts, we’ve already demonstrated that we can productively harness the efforts of thousands of scientists and community members. We’ve thus far focused on huge challenges in biological knowledge management. To complement the collaborative spirit of these projects, we have recently turned our own software development projects into open source projects.


  • Convert Gene Wiki Bot to write to Wikidata Pygenewiki(Gene Wiki Bot) automatically creates/updates ProteinBox Templates which form the infoboxes of gene wiki articles. It retrieves information about genes from databases (such as NCBI, HUGO etc),populates the ProteinBox Templates with this gene information which are then inserted into gene Wiki articles. Wikidata is free knowledge database aimed to provide structured data easily accessible to anyone. The new proposed bot would capture particular gene information onto a wikidata item and map it to the corresponding gene article. This will help to create structured data about genes in an easily accessible database. Also, other aspects about a gene such as its relationship with other genes, diseases etc can also be captured.
  • Developing an interactive decision tree builder for The Cure The present Cure interface includes a simple tree which is automatically constructed as and when the user clicks on a gene/protein. The aim of my project is to give the user more control over building this tree and also to make it more interactive and show more meaningful data via the said tree. In essence, it would be an interactive tree builder which would give the user complete control over the tree. The tree constructed would visually represent the numerical values namely, bin size, errors and the accuracy of the entire tree.
  • New Version of the Game Dizeez as a Facebook Application Dizeez is a quiz game. The theme of the game is gene- disease connection. Dizeez is currently implemented using Python & CGI with data on file-json as a stand alone webapplication. This project is about implementing Dizeez as a Facebook Canvas Application (game).A detailed description of the project is provided at the Project Abstract section.
  • Semantic BioGPS The aim for this project is to gather information about genes scattered around different sites across the web. This is done by selecting parts of the web page which are to be extracted and then present the data for a large list of genes. The goal for this summer is to make the annotation process more robust, the web crawling more reliable and to create an interface for accessing the data.