Convert Java code for decomposing compound words into C++
completed by: Kristaba
mentors: Francis Tyers, Kevin Brubeck Unhammer
The task is to take the implementation of decompounding in lttoolbox-java
see method: public String compoundAnalysis2(String input_word)
And port it to C++.
The corresponding C++ file is
https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox/lttoolbox/fst_processor.cc
Your port of compoundAnalysis2 should replace the (deprecated) method wstring FSTProcessor::decompose(wstring w)
Decompounding means splitting an unknown word into various parts, all of which could be known words on their own. See http://wiki.apertium.org/wiki/Compounding. But we require that words which may be possible compound parts have a certain tag, either compoundOnlyLSymbol or compoundRSymbol (so we don't try to find compounds of just anything).
The method pruneCompounds ensures compounds have no more than compound_max_elements parts, and always end in a part which contains the compoundRSymbol symbol.
You will need to work with your mentor so as to maintain equivalent functionality. There is a set of tests at http://apertium.codepad.org/aB0kcLMO