GSoC/GCI Archive
Google Summer of Code 2013

Wikimedia

Web Page: http://www.mediawiki.org/wiki/Summer_of_Code_2013

Mailing List: https://lists.wikimedia.org/mailman/listinfo/wikitech-l

We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. We have other content projects including Wikimedia Commons, Wikidata and the most recent one, Wikivoyage. We also maintain the MediaWiki engine and a wide collection of open source software projects around it.

But there is much more we can do: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help to these goals in many ways.

Projects

  • Android app for MediaWiki translation An app for making translations in a MediaWiki site with the Translate extension, such as translatewiki.net or meta.wikimedia.org.
  • Bayesian Spam Filter A token(word) based bayesian spam classifier for comabting wiki spam problmes.Besides words it takes into account a lot of other factors like capital letters, punctuation marks etc. Also adding support for large wiki for concurret edits.
  • Improve support for book structures I will improve support for wikis like Wikisource and WikiBooks, whose content is structured in a book format. I will improve the existing BookManager extension to allow these wikis to structure pages and sections of a book into an easily-navigable unit. Users will be able to enter information about the organizational structure of a book into a form, which will store the book's structure. This will then be used to (optionally) auto-generate navigation bars.
  • Incremental data dumps Currently, creating a database dump of larger Wikimedia sites takes a very long time, because it's always done from scratch. Creating new dump based on previous one could be much faster, but not feasible with the current XML format. This project proposes to create a new binary format for the dumps, which would allow efficient modification of the dump, and thus creating new dump based on the previous one. Another benefit would be that this format would also allow seeking, so a user can directly access the data they are interested in. A similar format will be also created, which will allow downloading only changes since the last dump was made and applying them to previously downloaded dump.
  • Incremental updates for Kiwix (offline Wikipedia) This project was thought up in order to make Wikipedia available to remote places without a proper internet connection. As of now, users need to download the full database every time they need to update, and this is quite cumbersome and/or impractical for a user with a slow internet connection. This project implements an incremental update feature in to the existing Kiwix software. Once finished, this would greatly benefit many schools/other institutes in developing regions of the world. It will enable them to keep a local cache of the data, which updates itself automatically.
  • Internationalization and Right-To-Left Support in VisualEditor I propose working on a series of improvements to VisualEditor concentrating on support for RTL languages like Hebrew, Arabic, Persian and Pashto.
  • jQuery.IME extensions for Firefox and Chrome jQuery Input Method Editor is a collection of more than 150 input methods across several languages. It is the jQuery version of the input method tool, Narayam, which is used across several Wikimedia projects. Currently jQuery.IME is provided from the Wikimedia servers. This project mainly aims at: (1) Porting jQuery.IME to Firefox and Chrome extensions. (2) Providing on demand loading of input methods for different languages rather than injecting all 150+ input methods on a web page. (3) Working out a solution to update the extension from the upstream project with minimal manual effort. As these extensions would allow the user to use the input methods on any website and not just on MediaWiki enabled websites, these would be of immense help to the users.
  • Language Coverage Matrix Dashboard The Language Coverage Matrix dashboard would help automate the information about language support provided by the Language Engineering team for e.g. key maps, web fonts, translation, language selector, i18n support for gender, plurals, grammar rules. The LCM would display this information as well as provide visualization graphs of language coverage using various search criteria such as tools or languages. I will build this web based dashboard using Javascript libraries integrated with MySQL to manage the data. I found this project very useful for language engineering team since wikimedia supports more than 300 languages. This tool will help them analyse the details of various available features of individual language. The Language Engineering team can efficiently prioritize and include some missing features, that is the features which are not currently available in particular language. The overall impact of this project will lead to an efficient and enhanced user experience for Wikis. This web based dashboard will also help other products and communities for showing innumerous search results and visualization graphs for the same.
  • MediaWiki-Moodle Extension My project will allow users of MediaWiki to display information about Moodle courses.
  • Mobilize Wikidata My project is aimed at extending Wikidata to make it accessible on mobile devices. Setting up Wikibase with MobileFrontend shows that while MobileFrontend can be used to achieve a mobile-friendly version of WIkidata, some problems arise due to JS based UIs not rendering properly through MobileFrontend. My plan is to implement Wikibase without Javascript to make it compatible with MobileFrontend.
  • Pronunciation Recording Extension Wiktionary is a multilingual, web-based project to create a free content dictionary, available in 158 languages. Each word existing in the dictionary has a separate page and many words have a pronunciation file embedded into their respective page. However there are many words especially words from a particular background like Mathematics, Medicine, Physics, etc that do not have an embedded pronunciation file. Wiktionary has an interface to upload the pronunciation of words but it is complicated and time consuming. Through this extension I plan on providing a "User-Friendly" interface to upload the pronunciation of a word. The project on completion would be very useful to the Wiktionary Community and to the millions of people using this "Online Free Dictionary".
  • Prototyping inline comments The goal of this project is to make a prototype for an inline commenting system where a user after landing to a wiki article can make useful comments on some part of the text and other users can optionally reply back to it. This will be implemented as a separate extension. I will be using the OKFN technology available under the MIT license which provides the user with tools to annotate text. The API methods will be provided to retrieve comments and to update them.
  • Refactoring of Proofread Page extension Wikisource is one of the largest projects of the Wikimedia Foundation. However, its editing process is really different from Wikipedia's because the goal is to publish already existing works in the Public Domain. This process relies almost entirely on the ProofreadPage extension, which adds to mediawiki features related to scanned books : mainly, proofreading the book's text by comparing it to scanned images. As a consequence, the ProofreadPage extension is key to Wikisource's well-being. An example of the use of ProofreadPage extension is http://en.wikisource.org/w/index.php?title=Page:Studies_of_a_Biographer_4.djvu/174&action=edit</br> The development history of the extension has been somewhat discontinuous, leading to a non-modular and difficult to extend code base. The goal of the project is two-fold: - reorganizing and refactoring the extension, making sure to make it modular, non-redundant, and documented as needed to ease future work - integrating the Visual Editor and extending it to make it compliant with the ProofreadPage extension's specific features
  • Section handling in Semantic Forms The Semantic Forms Extension in Mediawiki is a useful and widely used feature of MediaWiki to make semantically structured data, contained within template calls which are easy to create and edit. However at present Semantic Forms does not support structuring of wiki pages by page sections. Accomplishing this project would mean enabling administrators to define page sections in the form definition for the structure of their wiki pages. It would also allow users to add data to those defined page sections using forms in the Semantic Forms extension. The project would also extend the Page schemas extension to allow defining of page sections.
  • UploadWizard: Book upload customization( from ideas page) The proposed book upload customization for the Extension:UploadWizard would be activated as a as a UploadWizard Campaign. This book upload customisation would make appear buttons to import books and book metadata from external sources. More information on: Wikisource:Book uploader.
  • VisualEditor Math Equation Plugin Extend the mathematical functonality of VisualEditor. This will allow users, unfamiliar with wiki-text, to insert and edit content including mathematical symbols and equations.
  • VisualEditor plugin for source code editing Plugin for VisualEditor in the areas of source code editing, providing support to insert source codes easily, allowing features such as simple grammar checking, indentation correction, and beautifying code. The VisualEditor would definitely need these features and as a programmer I feel it is necessary to have these basic functions in every possible text editor that supports source code markups.
  • Wikidata language fallback and conversion Currently Wikidata stores multilingual contents. Labels (names, descriptions etc) are expected to be written in every language, so every user can read them in their own language. This works for a static data set with labels in all language filled, but it's not the case on Wikidata. This proposal aims at resolving these issues by displaying contents from another language to users based on user preferences (some users may know more than one languages), language similarity (language fallback chain), or the possibility to do transliteration, and allow proper editing on these contents.