GSoC/GCI Archive
Google Summer of Code 2014

Python Software Foundation

License: Academic Free License 3.0 (AFL 3.0)

Web Page: http://wiki.python.org/moin/SummerOfCode/2014

Mailing List: https://mail.python.org/mailman/listinfo/soc2014-general

Python is an interpreted, general-purpose high-level programming language whose design philosophy emphasizes code readability. Python aims to combine "remarkable power with very clear syntax", and its standard library is large and comprehensive. Its use of indentation for block delimiters is unique among popular programming languages.

The Python Software Foundation serves as an umbrella organization for a number of projects written in the Python programming language.

This year, the following organizations are participating under the Python Software Foundation umbrella:

  • Core Python
  • Astropy
  • BinPy
  • GNU Mailman
  • Kivy
  • Mercurial
  • MNE-Python
  • MoinMoin Wiki
  • pgmpy
  • PyDy
  • PyPy
  • scikit-image
  • scikit-learn
  • SciPy/NumPy
  • SCons
  • Scrapy
  • statsmodels
  • SunPy
  • TARDIS-SN
  • Theano
  • Vispy

Projects

  • [OpenHatch] "Open Source Comes to Campus" Bug Set Creator & Organizer In order to prepare for an on-campus event, the organizer picks out a set of small bugs of an appropriate difficulty for the audience. Currently, these bugs are handpicked by querying the web interface and copying bugs into a spreadsheet: a very labor-intensive process. My project adds a UI and back-end support to the OpenHatch website for generating a "bite-size" bug set by pulling data from our bug DB, allowing for user customization, storage and sharing of the set.
  • Astropy: Designing and Implementing a Framework for Propagation of Uncertai Astropy is a project to establish a convenient and powerful framework for astronomy related data analysis. Astronomical data in most cases is a representation of physical measurement. Any such measurement has an intrinsic uncertainty which must be handled properly throughout any computations for the result to have physical meaning. The proposal here suggests a framework allowing the propagation of uncertainty within Astropy and outlines the steps to achieve it.
  • Astropy: Enhancing the photutils package functionality The Astropy Project is a community effort to develop an Open Source Python package for astronomical data analysis. Currently the photometry functionality is developed in an affiliated package, called photutils. However, the integration of the astropy core modules in photutils is still very limited. During GSoC2014 I plan to improve the integration of the core modules in photutils and thus extending its functionality.
  • AstroPy: High performance ASCII table reader and memory view tables Currently, the astropy.io.ascii package contains support for reading and writing a number of text-based formats. For simple formats, it would be very useful to have an optimized parser that can efficiently read table data from large files. My proposal will involve implementing fast reading and writing for these formats. It will also include the possibility of implementing memory mapping for ASCII parsers. If all of this is done, I hope to work on general performance enhancement for AstroPy.
  • AstroPy: New remote services in astroquery package Astropy is a Python library for astronomy and astrophysics. One of its affiliated packages is astroquery. Astroquery offers APIs to web services to query astronomical data. The major plan is to add support for more services to the astroquery package. I will also fix astroquery issues to improve the stability and to extend existing web service interfaces.
  • Astropy: Reading/Writing Spectra Work on functionality to implement reading/writing spectra from/to various file formats for the specutils afilliated package. Spectra are one of the most important measurements in astronomy. Over the last decades a wealth of spectral information has been collected in many different formats. This project aims to make sure that astropy enables readers and writers for all of them.
  • BinPy: EXTENDING CORE LIBRARIES AND CLASSES BinPy is at a nascent stage now. Many more fundamental concepts and implementations can be done. In my project I propose to strengthen the core of BinPy by implementing as many as possible concepts to realize the objective of BinPy, i.e to virtualize electronics. I wish to strengthen core libraries to the extent that simulation of as many basic digital circuits / algorithms can be done via binpy.
  • Core Python - IDLE Improvements IDLE is a IDE which comes bundled with Python. It currently lacks full test coverage,and misses some common features like line-numbering and ability to integrate 3rd party code checkers. These are few issues with IDLE which make it a “disposable” IDE – novices use it only to later leave to another IDE, once they have gained some knowledge. The project aims to address these issues by extending IDLE test coverage , adding of line numbering and ability to integrate 3rd party code checkers.
  • CPython: Bring Unicode to Email As an undergraduate in Computer Sciences I want to implement support for internationalized email addresses and header values in the email package from Python. The goal is to have a fully tested and standard conform implementation of RFC 6532 with maintained backward compatibility and an useful API.
  • Developing the WCSAxes Framework for plotting Astronomical Images - Astropy Plotting Astronomical images is an essential part of research in Astronomy and this project is designed to develop a new framework to plot Astronomical images in world coordinate systems (WCS). Although there are packages capable of plotting Astronomical images such as APLpy, we want to develop a package that can be completely integrated with Astropy. A basic implementation of the new framework, WCSAxes already works but a lot more work still needs to be done on this before it can be released.
  • GNU Mailman Command Line Interface Project to build command line tools for better usability of Mailman.The project has two phases, Mailman Command line tools and Mailman CLI The command line tools is a set of userspace commands that can give a quick glance of the system.It can be useful for automation tasks by making functionalities a command away. The CLI for mailman is to provide a comfortable shell through which the users can query data in the installation based upon the state, using a understandable query language.
  • GSoC 2014: Extending Neural Networks Module for Scikit learn I propose to extend the Neural Networks module of scikit-learn with two main state-of-the-art algorithms widely used in academics, research and data analysis. These algorithms are Extreme Learning Machines and Deep Networks.
  • Improving MoinMoin 2.0 GUI The current GUI of moin 2.0 has quite a lot of work to be done. The aim of this project is to improve the GUI of moin 2.0 and to implement visualization of ACL rights.
  • Incremental mark and copy garbage collector for PyPy Python is an important platform for memory-constrained mobile devices such as mobile phones and PDAs. PyPy is a fast python interpreter written in RPython. However the current GCs still suffer high space overhead .This project proposed incremental mark and copy garbage collector to reduce the current GCs memory consumption and make a new garbage collector suitable for mobile devices.
  • Kivy: Plyer enhancements This proposal is based on the ideas listed on the Plyer enhancements here: http://kivy.org/docs/gsoc.html#enhancements-to-mobile-platforms. I would like to work on providing a more complete set of platform-independent APIs for accessing features commonly found across desktop (Linux, Windows and Mac OS) and mobile devices (Android, iOS).
  • Missing Data handling in Python/statsmodels: ICE/MICE Multiple Imputation This project will implement a "multiple imputation using chained equations" routine into the statsmodels Python package. This will allow researchers who use Python to have a more sophisticated treatment of missing data than the current complete case analysis treatment. In particular, scientists in the social and biomedical sciences can make better inferences with any model choice, per unit of costly data.
  • MNE-Python: Support Deep Structures in Source Spaces Subcortical brain structures mediate many cognitive processes, such as emotion and memory. The most widely used tool to study deep brain activity fMRI. Although fMRI offers excellent spatial resolution, its temporal resolution is slow compared to other brain imaging techniques, such as EEG and MEG. MNE-Python currently employs algorithms to localize sources of cortical M/EEG activity. I propose to utilize these algorithms to localize sources from subcortical spaces.
  • MNE-Python: Web access to MEG data The main objective of the project is to allow web access to MEG data through static html files created from MNE-Python. The result will be a tool which provides a summary of MEG analysis results and lets scientists explore them with minimal scripting.
  • MoinMoin: Improve Blog and ticket items The main aim of the project is to improve the ticket and the blog items. This would include solving the existing problems and adding some more features that are necessary for these items.
  • pgmpy : Implementation of Undirected Graphical Models and its algorithms I plan to implement the architecture for Markov Networks including creation and modification of graph, representation of node and edge potentials and conditional independencies in MRFs I also plan to implement the following algorithms : Exact Inference Algorithms including Triangulation Heuristics, creation of Junction Trees, and Belief Propagation ; Heuristic Inference Algorithm (alpha-expansion algorithm) and Inference using Sampling (Gibbs Sampling)
  • Pgmpy : Parsing from and writing to standard PGM file formats pgmpy is working on reading from and writing to various file formats. ProbModelXML is one of them. Now more formats like Elvira, DNET - 1, XMLBIF, MSBNx, pomdpX, Cassandra’s Format etc. are being added to make these compatible while working with pgmpy. My project deals with implementing popdpX, XMLBIF formats during the course of GSoC. This way models can be specified in a uniform format and readily converted to bayesian or markov model objects.
  • PyPy: Improvements for Bytearrays and Unicode strings in PyPy. Many operations on bytearrays internally make unnecessary copies, thus causing the complexity of these operations to be incorrect. The first part of this project will be fixing these operations. The representation of unicode strings on PyPy, at both the interpreter and application-level, is platform-dependant. The second part of this project will be to move unicode strings to UTF-8 internally with an application-level UTF-32/"wide-build" interface.
  • scikit-image : Graph based segmentation algorithms To implement Region adjacency graphs and graph based image segmentation algorithms. Initially I will be implementing graph primitives which will be useful for region adjacency graph based and other segmentation algorithms. Later I will implement segmentation algorithms using the primtives defined
  • scikit-image: Building an Interactive Gallery Currently, the examples gallery is a static collection of code snippets and output images. We'd like users to be able to modify the code there and see the result. The outcome of this project would be a web-app that launches a sandboxed, resource-controlled Python environment (probably built on top of Docker) that executes code snippets and provides the resulting image as a response. (extract from https://github.com/scikit-image/scikit-image/wiki/GSoC-2014#projects)
  • Scikit-learn - Add Sparse Input Support for Ensemble Methods, and Sparse Ou Scikit-learn is a an open source machine learning library that gives users access to cutting edge implementations of data classifying techniques. Data set size growth mean memory limitations are encountered more frequently. Improvements will be made to support sparse input and output formats to help make larger data sets and multiclass methods more feasible to work with when using large amounts of sparse data.
  • Scikit-learn: Improved Linear Models My project aims at improving the linear modelling in scikit-learn, by implementing the following goals. 1. Random / Cyclic co-ordinate descent 2. Finishing Gael's Logistic Regression CV PR 3. Finishing Larsman's Multiomial Regression PR + MultinomialRegressionCV 4. Strong Rules for (ElasticNet + Lasso) and (ElasticNetCV + LassoCV) 5. (L1 + L2) regression using CDN 6. Fixing all issues that I might face while dealing with the above goals.
  • scikit-learn: Locality sensitive Hashing for approximate neighbor search Several variants of LSH-ANN methods will be prototyped and evaluated. After Identifying the best method, hashing algorithms will be implemented. Then with the results got from prototyping stage, storing and querying structure of ANN will be implemented. After that ANN part will be integrated into sklearn.neighbors module. As these activities proceed, testing, examples and documentation will be covered. Bench marking will be done to assess the implementation.
  • SciPy/NumPy- enhancements in scipy.special (hyp2f1, sph_harm) The tentative areas of work shall be: Gaussian Hypergeometric Functions (2F1) and Spherical Harmonic Functions. The implementation of the hypergeometric functions in scipy.special is lacking in several respects, and is not optimal in many parameter regimes. Focus shall be to seal the present loop holes. Further Spherical harmonic function shall be enhanced as with the use of recurrence relation, also function to evaluate ellipsoidal harmonic function shall be created.
  • SciPy: Rewrite and improve cluster package in Cython According to the roadmap to SciPy 1.0, the `cluster` package needs a Cython rewrite to make it more maintainable and efficient. Besides, there's room for improvement in this package. Some useful features can be added and the performance can be improved with better algorithms and optimized BLAS library when dealing with large datasets.
  • Scrapy Core API cleanup & per-spider settings The proposed project aims to add a new mechanism to override Scrapy settings with spiders (classes where users define crawling behaviour). In order to do that, and pursuing a simpler and easier API usage, a major refactoring in core code of crawl process and settings population has to be made. This proposal attempts to describe all required changes and estimate how much time they would take.
  • Statsmodels: State Space Models This project would introduce general discrete time state space models to the Statsmodels' Time Series Analysis, provide a general structure for their representation, allow optimal estimation of unknown states via a performant multivariate Kalman filter, and provide specific functionality for the representation and estimation of the very common class of (Vector) Autoregressive Moving Average (VARMA) models.
  • SunPy : LightCurve Refactor Idea This project aims to shift lightcurve class base data structure to Astropy Table from the current Pandas DataFrame which being good for statistical purposes is not written keeping astronomers in mind.This project would add similar features as provided by DataFrame to Astropy Table.
  • SunPy : Re-implementation of sunpy.wcs as sunpy.coordinates using astropy The current sunpy.wcs has methods defined for transformation of different solar coordinate frames. But they do not support transformation between different coordinate representations. This project aim to re-implement the sunpy.wcs as sunpy.coordinates, basing the implementation on the idea described in APE5. Doing so, it hopes to achieve a much more user-friendly interface with the coordinate systems, as well as adding more functionality to the existing SunPy.
  • SunPy: A ginga based data explorer / database browser Exploring solar data for interesting events and extracting important information is important for being able to do research. Currently, such a GUI program does not exist for Python. However, with the current capabilities of Ginga, it offers a very good base. It allows us to expand to meet the requirements for solar data. Ginga is designed to use a plugin system. This enables SunPy specific plugins to be written that can be installed by any user.
  • SunPy: Re-implementation of sunpy.wcs as sunpy.coordinates using astropy SunPy provides routines for handling the representation and various transformations of solar coordinate information - with coordinate systems such as the Heliographic and Heliocentric. However, the Astropy API, and astropy.coordinates in particular, provides an opportunity to re-use already existing code which is efficient by design, as per APE5. My job would be to utilize Astropy's API in the reimplementation of the sunpy.wcs package.
  • TARDIS-SN Restructuring and Optimization The goal of the project is to restructure, profile and optimize TARDIS-SN simulation code. If fully completed it will leave core TARDIS-SN simulation routines easier to maintain, understand and running faster.
  • Tarun Gaba(PyDy) - PyDy-Viz Improvements and Enhancements(Draft) PyDy-Viz is a visualization package for managing 3D visualizations of dynamic systems. Currently PyDy relies on external tools for numerical integration. This project aims at providing: - A fast javascript based numerical integration module. - Support for IPython and MGView. - Other feature enhancements(support for CAD/blender objects etc.).
  • Theano: Lower Memory Usage Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. I'd like to analyse and optimize the memory used by Theano functions during their execution. Current code only does dumb greedy approach to memory allocation coupled with a garbage collector to deallocate memory once unneeded. This project would to do better by possibly lowering the maximum memory allocated at once and avoiding reallocations.
  • Vispy in the Browser: Online Backend & IPython Integration Web-based visualization has high-potential. Therefore, in order to create "next-generation" interactive visualization software, Vispy wants to combine its abilities with the browser's. An online backend should be implemented which will help users to interact with the big data from their browsers.
  • Vispy: Implementing Visuals layer and related functionality Vispy currently offers a Pythonic interface to OpenGL, gloo, which requires the knowledge of OpenGL and shaders. The next step towards achieving Vispy's primary goal, ie independence from OpenGL for the end-users, is the visuals layer. The visuals layer provides an abstraction layer over gloo and lets you create visual objects on the scene with a Pythonic interface. The objective of this proposal is to implement priority visuals and related functionality via the visuals architecture.