GSoC/GCI Archive
Google Summer of Code 2013 Apache Software Foundation

FOAF Co-reference Based Entity Disambiguation Engine In Apache Stanbol

by Dileepa Jayakody for Apache Software Foundation

The proposed project mainly focuses on developing an 'Entity Disambiguation Engine' in Apache Stanbol by computing co-referent relations in friend-of-a-friend (FOAF) data-sets. The same entity (persons, organizations) can be referred by different names and vice-versa on the web which leads to the 'named ambiguity' problem of entities. This problem can affect the accuracy and relevance of results inferred by semantic engines and leads to the requirement of using effective disambiguation techniques to process entities as part of the enhancement process in the semantic engines. This proposal focuses on using FOAF profiles as a datasource and process them to resolve name ambiguity problem in an effective way. FOAF is a vocabulary used to describe people, organizations and groups in the form of linked data to form an entity network on the web. The relationship of these FOAF instances can be very useful to derive new knowledge about entities using semantic techniques. Some of the FOAF instances may denote the same entity in different approaches with different information. Therefore it's essential to identify which FOAF instances are describing the same entity over the web and identify the co-reference relationship between them. The co-reference analysis can use FOAF attributes such as mbox, homepage, weblog, as unique identifiers (inverse functional properties) to match FOAF instances to identify co-referent clusters and use it to disambiguate entities over the web. This project aims to develop a comprehensive disambiguation algorithm by identifying and clustering co-referent FOAF instances which describes the same entity over the web. Furthermore, integration of FOAF profile support in Stanbol can be useful to develop a user network in content management systems and improve access control of content using social graph techniques.