GSoC/GCI Archive
Google Code-in 2012 Apertium

Extract Armenian noun translations from Wiktionary

completed by: conor-f

mentors: Francis Tyers, Jonathan

Wiktionary has lots of translations for Armenian nouns, for example consider the page:

 

http://en.wiktionary.org/wiki/%D5%A1%D5%BD%D5%BF%D5%AB%D5%B3%D5%A1%D5%B6#Noun

 

Noun

աստիճան (astič̣an)

  1. degree
  2. extent

...

 

The idea of this task is to extract these translations into lttoolbox XML format as follows:

 

<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>degree<s n="n"/></r></p></e>
<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>extent<s n="n"/></r></p></e>
<e c="of stairs"><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>step<s n="n"/></r></p></e>
<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>stair<s n="n"/></r></p></e>
<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/><s n="pl"/></l><r>stairs<s n="n"/></r></p></e>
<e c="colloquial"><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>ladder<s n="n"/></r></p></e>

 

You will need to look out for:

* translations which are only translations in the plural

* making sure that comments are put in the comment field

* ensuring that the animacy on the Armenian side is correct

 

For further information about this task, join us on IRC: irc.freenode.net #apertium