GSoC/GCI Archive
Google Summer of Code 2012

Xapian Search Engine Library

Web Page:

Mailing List:


Xapian is a Search Engine Library which aims to be fast, scalable, and flexible. It's used by many organizations around the world, including Debian, One Laptop per Child, and the Gmane mailing list archive. It supports probabilistic ranking and a rich set of boolean query operators. The core library is written in C++, with bindings to allow use from C#, Java, Perl, PHP5, Python, Ruby, Tcl and Lua.

You can:


  • Bi-gram Language Modeling Bi-gram Language modeling approach to information retrieval have proved to outperform the three traditional IR approaches . Bi-gram Language model apart from better retrieval performance renders a rich resource Bi-gram from collection which can be used for phrase searching, Diversifying search results, and query reformulation suggestion to user. Bi-gram Language model would make Xapian a more powerful library for research in information retrieval.
  • Erlang Bindings Bindings for Erlang using linked in port driver
  • Node.js Bindings The aim of this project is to provide a Javascript API to Xapian for use in Node.js.
  • Omega: Dynamic Snippets Create a xapian-core API for dynamically extracting query-relevant snippets. Integrate the API with Omega