GSoC/GCI Archive
Google Summer of Code 2012


Web Page:

Mailing List:

The CMUSphinx project is the leading speech recognition project in open source world. Since being released as open source code in 1999, it has provided a platform for building ASR applications. Nowadays, it's used in desktop control software, telephony platforms, intelligent houses and more than 20 other applications.

Over its long history, the project has been supported by CMU, SUN, Mitsubishi Electric, LIUM and many other organizations. Thousands of students use CMU Sphinx in their studies to learn state-of-the art in machine learning algorithms. CMUSphinx has been the base for more than 20 PhD theses.

With the growing interest in mobile applications CMU Sphinx started to support core mobile platforms, in particular Google Android. It's now a unique library to provide in-device ASR. Because of that it's attracting the attention of the mobile developers. CMU Sphinx is aiming at the end user which differentiates it from other toolkits. Our goal is to bring open source speech recognition from the universities to every computer, and this moment is getting closer every day.

Now, CMUSphinx is aiming at the end user. Our goal is to bring open source speech recognition from the universities to every computer, and this moment is getting closer every day.


  • Letter to Phoneme Conversion in sphinx4 Currently sphinx4 uses a predefined dictionary for mapping words to sequence of phonemes. I propose modifications in the sphinx4 code that will enable it to use trained models (through some king of machine learning algorithm) to map letters to phonemes and thus map words to sequence of phonemes without the need of a predefined dictionary. A dictionary will be only used to train the required models.
  • Mobile Pronunciation Evaluation for Language Learning Using Edit Distance Scoring with CMU Sphinx3, Copious Speech Data Collection, and a Game-Based Interface Pronunciation learning is one of the most important parts for second language acquisition. The aim of this project is to utilize the automatic speech recognition technologies to facilitate spoken language learning. This project will mainly focus on developing accurate and efficient pronunciation evaluation system using CMU Sphinx3 and maximizing the adoption population by implementing mobile apps with our evaluation system. Additionally, we also plan to design and implement game based pronunciation learning to make the learning process much more fun. Four specific sub-tasks are involved in this project, namely, automatic edit distance based grammar generation, exemplar pronunciation database building, Android pronunciation evaluation app interface implementation and game based learning interface development.
  • Postprocessing Framework Postprocessing Framework refers to a part of the speech recognition process in which the word stream resulted in the basic recognition process is sentence segmented, punctuation is recovered, capitalization is performed and abbreviations are made when needed.
  • Web Data Collection for Language Modelling An automatic speech recognition system uses language models as well as acoustic models of speech sounds. These language models are constructed by using machine learning algorithms on very large text corpora. Performance of the model is closely related to the amount and style of text data. Obtaining large amount of data for a certain domain to increase performance of the model is an expensive task, as domain-specific spoken text corpora is generally sparse. Using automatic means to extract additional text from the World Wide Web is a popular approach for solving this problem. In this project, a web crawler that extracts additional language model training data from the web for a given domain was implemented.
  • Web-Based Pronunciation Evaluation Using Acoustic, Duration, and Phonological Scoring with CMU Sphinx3 Create and measure the performance of an automatic pronunciation evaluation system based on Sphinx3 which will detect mispronunciations at the phoneme level and provide feedback scores and learner adaptation with phoneme, biphone, word, and phrase scores based on standardized phoneme acoustic scores and durations, edit distance scoring using alternate pronunciation grammars, and, if time permits, articulation-based phonological features.