GSoC/GCI Archive
Google Summer of Code 2012 Open Bioinformatics Foundation

SearchIO Implementation in Biopython

by Wibowo Arindrarto for Open Bioinformatics Foundation

Biopython is a widely-used Python-based toolset for working with biological data. It was built mainly to simplify biological data analysis workflows, for example by providing parsers for various data file formats, wrappers for command line programs, and interfaces for remote data sources. However, until now it still lacks a common framework for interacting with outputs of sequence search programs. These programs allow similarity-based searching across various biological sequence databases, a task inseparable from modern biology research. Unfortunately, extracting information from their outputs is often difficult due to the amount of results produced and the dense information packed with them. To solve this problem, this project aims to add a submodule called SearchIO to Biopython. SearchIO will allow more systematic information extraction from sequence-search programs’ outputs and an easier interaction with various output formats through a common programming interface.