GSoC/GCI Archive
Google Summer of Code 2014

Shogun Machine Learning Toolbox

License: GNU General Public License version 3.0 (GPLv3)

Web Page: http://shogun-toolbox.org/page/Events/gsoc2014_ideas

Mailing List: http://news.gmane.org/gmane.comp.ai.machine-learning.shogun

We are the team that develops the Shogun machine learning toolbox. Shogun is designed for unified and efficient machine learning (ML) for a broad range of data types and learning settings, such as classification, regression, or exploratory data analysis. It can be used from various programming languages with almost identical syntax. Shogun got initiated with its members being leading members of the machine learning community. Shogun has diversified with time and is now used by researchers and data practitioners to tackle various problems related to all areas of science. We have various ideas for this years Summer of Code. Mainly we are looking to extend the library in two different ways:
  1. Improving shoguns infrastructure
  2. Integration of existing and new machine algorithms.
Please read how to get accepted for a GSoC with us before applying. Then please use the scheme shown below for your student application. If you have any questions, ask on the mailing list (shogun-list@shogun-toolbox.org, please note that you have to be subscribed in order to post) or IRC #shogun on freenode.

Projects

  • application to the idea of Variational Learning for Recommendations This is an application to the idea of Variational Learning for Recommendations with Big Data
  • Essential Deep Learning Modules Deep learning is an exciting new area in machine learning. It is specially important now as it has the capability to leverage the massive amounts of unlabeled data becoming available through the internet. It would be a great addition to the shogun toolbox.
  • Fundamental Machine Learning Algorithms The aim of the project is to implement certain fundamental ML algorithms, refactor code of certain existing algorithms to boost performance and create attractive ipython notebooks. We propose to implement algorithms on decision tree learning and fast kernel density estimation. We also plan to refactor PCA, KPCA and LARS code to make use of Shogun's own eigensolver. Finally we plan to create notebooks with real-life examples on decision trees, KNN and regression techniques.
  • Large-Scale Multi-Label Classification This proposal focuses on extending support for approximate inference in current structured output framework in Shogun: 1) Implementing large-scale multi-label classification algorithms such as online structured learning and multi-label prediction using feature hashing. 2) Implementing the calibrated label ranking for large- scale multi-label structured output learning problems.
  • large-scale structured prediction with approximate inference The factor graph model in Shogun provides a good example on how to use structured output SVM to learn parameters of graphical model. However the compatibility with approximate inference has not yet been explored. In this project, three approximate inference algorithms will be implemented and evaluated on multilabel classification. Furthermore, to demonstrate the scalability for large-scale applications, two computer vision demos, image inpainting and figure-ground segmentation will be created.
  • OpenCV Integration and Computer Vision Applications The main idea for the Project is to integrate OpenCV into Shogun cleanly, that is in a way that if OpenCV changes their structures we only need to change Shogun at one place. We want to make Shogun the one stop destination for adding machine learning to their applications.
  • Shogun Missionary & Shogun in Education The project aims to boost shogun's acceptance in the world. This would be done by coding up new ipython-notebooks and web-demos.
  • Testing and Measuring Variable Interactions With Kernels Statistical tests for independence or equality measures, two fundamental tools in statistical data analysis, often play an important role in feature selection algorithms. In structured input domain, several kernel based measures result in such powerful tests. Many recent researches on this exhibit significant advantage over the others. This project aims at showcasing several such ideas under one modular framework along with demonstration and exploring further research opportunity in this domain.