Xapian Search Engine Library
Web Page: http://trac.xapian.org/wiki/GSoCProjectIdeas
Mailing List: http://trac.xapian.org/wiki/GSoC_Mailing_List
Xapian is a Search Engine Library which aims to be fast, scalable, and flexible. It's used by many organizations around the world, including Debian, One Laptop per Child, and the Gmane mailing list archive. It supports probabilistic ranking and a rich set of boolean query operators. The core library is written in C++, with bindings to allow use from C#, Java, Perl, PHP5, Python, Ruby, and Tcl.
- browse our list of suggested ideas.
- contact us via IRC or e-mail.
Our code repository can be found here: http://code.google.com/p/google-summer-of-code-2011-xapian/
- Chinese segmentation Analysis Using Chinese segmentation algorithm to improve the performace when Xapian processing large amount of Chinese text.
- Learning to Rank I want to add a ranking function with supervised learning which will learn from learning algorithms like Support Vector Machine. This function can assign the scores to the document for the particular query depending on the feature vector for the first retrieval and re-rank them. More about this can be found at wiki page : <a href=http://trac.xapian.org/wiki/GSoC2011/LTR>http://trac.xapian.org/wiki/GSoC2011/LTR</a>
- Spelling correction improvements The aim of this project is to make a Xapian's spelling correction system faster and much more better. Currenly, this system has simple correction rules but this project will make it significantly better. This project includes performance improvements, new fuzzy search algorithms, phonetic algorithms, more adjusting abilities, and some ranking improvements.
- Support Lua on Xapian This project aims to support Lua on Xapian. Lua is a powerful, fast, lightweight, embeddable scripting language. It has been used in many industrial applications and games. So support for Lua could allow Xapian to be more widely used and more powerful. Also many Lua projects could benefit from such binding as they could use Xapian as a highly adaptable Search Engine library.