GSoC/GCI Archive
Google Summer of Code 2014

Bio4j

License: Affero GNU Public License

Web Page: https://github.com/bio4j/gsoc14/wiki/ideas

Mailing List: https://groups.google.com/forum/#!forum/bio4j-user

Bio4j is a high-performance cloud-enabled graph-based bioinformatics data platform, integrating most data available in the most representative open data sources around protein information available today. It models and incorporates most data available in - UniProtKB (SwissProt + Trembl) - Gene Ontology (GO) - UniRef (50, 90, 100) - RefSeq - NCBI taxonomy - Expasy Enzyme Bio4j is unique in this space as the only truly open effort with code licensed as AGPLv3, integrating only open data, and a 100% public development and release process. Bio4j was initiated in 2010 and is led by Oh no sequences!, the Era7 bioinformatics R&D group.

Projects

  • DynamoDB backed bio4j prototype In proposal I do not only present some kind of simple, vision of the project but also I share my point of view about importan matters: what is really important to write really good code, why I decide to choose this specific project as well as short description of my experience related to this idea.
  • Gexf and GraphML exporter Bio4j is a high-performance cloud-enabled graph-based open source bioinformatics data platform, integrating the data available in the most representative open data sources around protein information. The minimum goal is to develop a tool and a minimal library/API capable of executing queries over a given bio4j module and export the results in known formats like Gexf and GraphML.
  • graphical browser for bio4j model Development of an interactive web-based tool that will allow users to intuitively explore the Bio4j domain model, getting details about both nodes/edges shaping the network as well as precise typing information in order to understanding the Bio4j structure and managing or querying it on an easier and more efficient way.