GSoC/GCI Archive
Google Summer of Code 2009

NESCent - National Evolutionary Synthesis Center

Web Page: http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

Mailing List: phylosoc@nescent.org

NESCent facilitates synthetic research on grand challenge questions in evolutionary biology and also works to address critical needs in software infrastructure and education through promoting open, collaborative development of interoperable and standards-supporting open-source software. The Center is located in Durham, North Carolina, is jointly operated by Duke University, the University of North Carolina at Chapel Hill, and North Carolina State University, and receives its core funding from the National Science Foundation (NSF). NESCent has so far run three Hackathons aimed at improving interoperability, workflow integration, and standards support in phyloinformatics, involving developers from open-source life-science programming toolkits, evolutionary and comparative phylogenetic methods software, and online data resources. These events, and our past Summer of Code participations, continue to have significant and lasting impacts on the landscape of collaborative software development in our field. The Center is committed to FLOSS and sharing of scientific data (see for example the NESCent Data and Software Policy at http://www.nescent.org/informatics/data_software_policy.php); all software products of the Center are released as open source and established as collaborative projects on sites such as SourceForge or Google Code. Members of the Center's Informatics team are lead developers in several open-source projects, and one of our organization administrators has been active for seven years on the Board of the Open Bioinformatics Foundation (http://www.open-bio.org/), the umbrella organization for the Bio* projects.

Projects

  • A BioLib mapping for the libseqence population genetic libraries BioLib brings together a set of opensource C/C++ libraries and makes them available to all Bio languages: BioPerl, BioPython, etc. Biolib utilizes SWIG toolkit to map (wrap) the C/C++ programs to Bio languages. Libsequence is a C++ library for modern popular genetic simulation. The objective of this project is to build BioLib mappings for libsequence.
  • Biogeographical Phylogenetics for BioPython Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mappinge.g. in Google Earth (KML files).
  • BioPerl integration of the NeXML exchange standard and Bio::Phylo toolkit This project will integrate the NeXML exchange standard into BioPerl, facilitating the adoption of this standard and easing the transition from the overworked NEXUS standard. A wrapper will be used to allow BioPerl native access to the preferred NeXML parser (Bio::Phylo), allowing Bio::Phylo and NeXML to co-evolve without being encumbered by BioPerl. Additionally, test cases and example sets will be developed that target real world uses. More Info: http://filebox.vt.edu/users/chmille4/G.pdf
  • Biopython support for parsing and writing phyloXML PhyloXML is an XML format for phylogenetic trees, designed to allow storing information about the trees themselves (such as branch lengths and multiple support values) along with data such as taxonomic and genomic annotations. Connecting these pieces of evolutionary information in a standard format is key for comparative genomics. A Bioperl driver for phyloXML was created during the 2008 Summer of Code; this project aims to build a similar module for the popular Biopython package.
  • Build a Mesquite Package to view Phenex-generated Nexml files Extend Mesquite's capabilities to represent character matrices by allowing for the addition of ontology terms produced by Phenex in the Nexml format. Mesquite currently supports the Nexml file format and by the end of this project would be able to display Phenex's files directly in Mesquite, and depending on the speed of development, would display chained ontology concepts in graphical format with the use of a library like GraphViz.
  • Enhance the searching functionality of Phylr PhyloWS is an emerging standard for interacting with phylogenetic data via web services. Phylr, is an initial implementation of the SRU search in PhyloWS. The goals of this project are to enhance the Java code that translates between the SRU server and a Lucene index and make it fully conform to the PhyloWS specs, modify the XSL stylesheets to present a more intuitive interface to the users, and write a new Java adapter that translates between the SRU server and a relational database.
  • GPU acceleration for phylogenetic inference using OpenCL In this project I will implement an open-source C++ library to compute phylogenetic tree likelihoods on the GPU, with the core loops in OpenCL. Currently in phylogenetic inference there is a strong demand for more computational speed and typically likelihood calculation is the main bottleneck. Moving to GPU acceleration is a natural step to address that issue and this project has the potential to benefit several state-of-the-art evolutionary analysis packages.
  • Inplementing phyloXML support in BioRuby Phylogenetic trees are used in important applications, including phylogenomics, phylogeography, gene function prediction, cladistics and the study of molecular evolution. In order to foster successful analysis, exchange, storage and reuse of phylogenetic trees and associated data, the phyloXML format was developed. It can store all necessary information about the phylogenetic tree, like clade, sequence, name and distance. The goal of this project is to implement support for phyloXML in BioRuby.
  • Mapping the Bio++ Phylogenetics toolkit to R/BioConductor and BioJAVA using BioLib Bio++ is a C++ library for phylogenetics algorithms. Unfortunately, few bioinformatics scientists work in C++. The plan is therefore to translate the library into multiple other languages, including Java and R. (Perl, Python, and Ruby have been done.) Doing so should allow us to reach the majority of active bioinformaticians.