GSoC/GCI Archive
Google Code-in 2010 The Apertium project

Convert Java code for decomposing compound words into C++

completed by: Kristaba

mentors: Francis Tyers, Kevin Brubeck Unhammer

The task is to take the implementation of decompounding in lttoolbox-java

see method:    public String compoundAnalysis2(String input_word) 

And port it to C++. 


The corresponding C++ file is

Your port of compoundAnalysis2 should replace the (deprecated) method wstring FSTProcessor::decompose(wstring w) 


Decompounding means splitting an unknown word into various parts, all of which could be known words on their own. See But we require that words which may be possible compound parts have a certain tag, either compoundOnlyLSymbol or compoundRSymbol (so we don't try to find compounds of just anything).

The method pruneCompounds ensures compounds have no more than compound_max_elements parts, and always end in a part which contains the compoundRSymbol symbol. 


You will need to work with your mentor so as to maintain equivalent functionality. There is a set of tests at