GSoC/GCI Archive
Google Summer of Code 2013

DBpedia & DBpedia Spotlight

Web Page:

Mailing List:

Almost every major Web company has now announced their work on a knowledge graph, including Google’s Knowledge Graph, Yahoo!’s Web of Objects, Walmart Lab’s Social Genome, Microsoft's Satori Graph / Bing Snapshots and Facebook’s Entity Graph.

DBpedia is a community-run project that has been working on a free, open-source knowledge graph since 2006. DBpedia currently exists in 97 different languages, and is interlinked with many other databases (e.g. Freebase, New York Times, CIA Factbook) and hopefully, with this GSoC to Wikidata, too. The knowledge in DBpedia is exposed through a set of technologies called Linked Data. Linked Data has been revolutionizing the way applications interact with the Web. While the Web2.0 technologies opened up much of the “guts” of websites for third-parties to reuse and repurpose data on the Web, they still require that developers create one client per target API. With Linked Data technologies, all APIs are interconnected via standard Web protocols and languages.

One can navigate this Web of facts with standard Web browsers, automated crawlers or pose complex querie with SQL-like query languages (e.g. SPARQL). Have you thought of asking the Web about all cities with low criminality, warm weather and open jobs? That's the kind of query we are talking about.

This new Web of interlinked databases provides useful knowledge that can complement the textual Web in many ways. See, for example, how bloggers tag their posts or assign them to categories in order to organize and interconnect their blog posts. This is a very simple way to connect “unstructured” text to a structure (hierarchy of tags). For more advanced examples, see how BBC has created the World Cup 2010 website by interconnecting textual content and facts from their knowledge base. Identifiers and data provided by DBpedia were greatly involved in creating this knowledge graph. Or, more recently, did you see that IBM's Watson Watson used DBpedia data to win the Jeopardy challenge?

DBpedia Spotlight is an open source (Apache license) text annotation tool that connects text to Linked Data by marking names of things in text (we call that Spotting) and selecting between multiple interpretations of these names (we call that Disambiguation). For example, “Washington” can be interpreted in more than 50 ways including a state, a government or a person. You can already imagine that this is not a trivial task, especially when we're talking 3.64 million “things” of 320 different “types” with over half a billion “facts” (July 2011).

After a successful GSoC2012 with DBpedia Spotlight, this year we join forces with the DBpedia Extraction Framework and other DBpedia-family products. We got excited with our new ideas, we hope you will get excited too!


  • DBpedia: Design a better / interactive display page (+ Search) An interactive, user-friendly interface to view (and search) DBpedia entities with some nice additions.
  • HadyElsahar - WikiData+DBpedia Idea proposal WikiData is a new knowledge base that is going to be the source for Wikipedia Structured data like infoboxes, language links and external links, it should be for structured data what Wikimedia Commons is for media files. This project in a glance mainly aims for Wikidata published data to be utilized inside DBpedia , in other words to be converted into triples and saved inside DBpedia servers as well as providing another Live version which propagates all the wikidata changes in the runtime to DBpedia.
  • Input Formats Generalization and Graph-Based Disambiguation Integration and Improvements My proposal is a combination of two ideas, "Generalize input formats and add support for Google mention corpus" and "Efficient graph-based disambiguation and general performance improvements". This project will contribute to architecture flexibility and general performance improvement of DBpedia Spotlight.
  • Interface / Power tool for DBpedia testing metadata Interface / Power tool for DBpedia testing metadata
  • Type inference to extend coverage Use categories, infoboxes, abstracts, other DBpedia properties, etc. to infer the type of an entity, and therefore extend the coverage of the DBpedia ontology.