GSoC/GCI Archive
Google Summer of Code 2014 Open Bioinformatics Foundation

Addition of a Lazy Loading Sequence Parser to Biopython’s SeqIO Package

by Evan Parker for Open Bioinformatics Foundation

Biopython’s SeqIO package is used to parse sequence files such as the popular FASTA format and heavily annotated formats like GenBank flat file format. Currently the module will completely parse a sequence prior to returning a sequence record object. By implementing an indexing and lazy loading sequence parser, Biopython can enable more efficient use of large sequence files such as chromosomes or entire genomes.