GSoC/GCI Archive
Google Code-in 2012 Apertium

bible scraper

completed by: Daniel Huang

mentors: Francis Tyers, Jonathan

Write a scraper that will parse bible translations (especially in languages like Uzbek, Kazakh, Mongolian, and Qaraqalpaq) from a site.  The scraper should take a list of urls of bible translations and output to a file just the text.  The format should roughly be as follows:

[Book 1]

1. In the beginning there was a scraper.

2. The scraper made a separate line for each verse.

 

[Book 2]

1. Then there was another book.

etc.