GSoC/GCI Archive
Google Code-in 2012 Sugar Labs

OCR Document formatting (4 of 46)

completed by: Ambar Pal

mentors: Chris Leonard

The overall goal is to transform a 450+ page image-PDF file of an Asháninka-Spanish and Spanish-Asháninka dictionary into OCR'ed digital text. (general description). This has been broken into a series of individual tasks, the pages to be processed for this task are (batch 4).