POI enrichment with DEER

Linked Data enrichment is the process of adding, altering or deleting a set of triples of a source dataset in order to obtain an enriched version of the source dataset. This enriched dataset usually provides significant benefits for a specific use case(s) scenario(s). This benefits include (but are not limited to) more data (quantity), better data quality, better data organization (refined ontology), interoperability with other datasets (interlinking), etc. Over the last years, a few frameworks for RDF data enrichment have been developed. Such frameworks provide enrichment methods such as entity recognition, link discovery and schema enrichment.

DEER is a linked data enrichment framework that provides two main types of artifact: atomic enrichment functions and atomic enrichment operators. Thus, in case a user knows the type of enrichment that is to be carried out, (s)he can define the sequence of enrichment functions/operators that must be used to process his dataset. The task of an atomic enrichment function is to determine the set of triples to be added, altered or deleted from the source dataset to generate the enriched dataset. Currently, DEER implements set of atomic enrichment functions including dereferencing, linking, conformation, filter and NLP. For instance, the idea behind the dereferencing enrichment function of DEER is to find enrichment data from interlinked datasets. That is, for a source dataset which contains owl:sameAs or similar links, DEER dereference all links from this dataset to other datasets by using content negotiation on HTTP, while this process ends by adding the relevant information from the returned set of triples to the source dataset. The idea behind the enrichment operators is to enable users to define a workflow for processing their input dataset, such as splitting and merging of dataset(s). However, the enrichment functions implemented by DEER (a) require manual configuration, and (b) do not exploit geospatial and temporal features of the input datasets, such as the case of POIs.

POI Enrichment

In SLIPO, we will adapt the DEER framework to effectively handle the enrichment of POI data. DEER was designed to be a modular framework which can be easily extended and re-purposed. Therefore, in addition to using the available enrichment functions in DEER, we intend to extend them by implementing POI-related enrichment functions, such as: retrieving the location of a POI from a third party geo-location service, determining the validity of a certain POI in a certain time based on a given time stamp, grouping POIs into areas of interest. Currently, DEER implements a supervised machine learning approach for generating the aforementioned sequence of enrichment functions/operators that must be used to process the input dataset. Some limitation in the current supervised approach implemented in DEER are the usage of only one input dataset as well as the generation of also one enriched dataset. In SLIPO, we will extend this approach by enable the DEER supervised approach to accept a set of input datasets and generate a set of output datasets. Also, we plan to apply unsupervised or weakly supervised approaches for the automatic detection of enrichment configuration for enriching POI data with a focus on its geo-spatial and temporal dimension. Our approaches will not only aim to enrich POI, but also to provide the enriched data in a format suitable industrial consumption.