better wikipedia extractor script
completed by: Ben Stobaugh
mentors: Jonathan Washington, Francis Tyers
Make a single script that performs all the steps listed at Wikipedia Extractor. That is, it should take a wikipedia dump file as input and output a file that is for all intents and purposes identical to what is output by the last step listed on the wiki. There should be no intermediate files stored anywhere, and it should not use any more memory than absolutely necessary, but feel free to use as much of the existing code as you need. You may wish to consult guampa's [much-improved] fork of the WikiExtractor script at , though it doesn't do everything itself either.