GSoC/GCI Archive
Google Code-in 2014 Apertium

Write a general Apertium Stream Format parser in Python.

completed by: Sushain Cherivirala

mentors: Kevin Brubeck Unhammer, Jonathan

A lot of our scripts need to read the Apertium Stream Format and e.g. extract all lemmas or forms of all nouns and similar things. Write a general Python module that we can import, which defines generators that return LexicalUnits, so we can write <code&gt;&lt;nowiki&gt;&lt;a href="(lu.lemma,">lu.readings[0:) for lu in asfparse.parse_file(sys.stdin)]</nowiki&gt;&lt;> . Something like this is already implemented in the concordancer, so this task may just consist of abstracting away that code into a library usable as described here.
For further information and guidance on this task, you are encouraged to come to our IRC channel.