Fusing POI data with FAGI: Goals and Challenges

POI data are valuable resources, comprising the cornerstone of many different domains of our economy: from navigation and logistics, to tourism and leisure, to social networks, applications and end users consume and even produce POI data. Due to the wide range of applications and vendors, as well as commercial competition reasons, POI data are being produced in a fragmented and disconnected way. Specifically, POIs from different datasets do not adhere to common schemas and formats, contain partial or even conflicting metadata, and are rarely updated. On top of that, POIs are inherently ambiguous, since their name and coordinates may vary from dataset to dataset, making it often difficult to discriminate between different POIs or interlink the same POIs.

In order to face these challenges and produce data of higher completeness, coverage, consistence and quality, fusion processes need to be applied on POI datasets. Fusion consists in receiving as input two (or more) linked POI entities and producing a unified and consolidated representation of them, that is more complete, concise and accurate than the individual initial linked entities. It comprises a first step of matching properties (metadata) between the linked entities and a second step of applying potentially different fusion actions to each pair of matched entities and their attributes.

SLIPO aims to extend FAGI, a tool developed within GeoKnow project and adapt it to the specific requirements of fusing POI data. FAGI is the only framework that supports fusion of geospatial linked Data. It receives as input two geospatial RDF datasets and a set of links that interlink entities between them and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action.

Previewing linked geometries in FAGI

FAGI offers a set of 17 fusion actions handling both spatial and non-spatial metadata. Indicatively, FAGI allows the concatenation of strings and geometries, shifting and re-scaling of geometries and mutual handling of semantically related properties. Moreover, FAGI implements advanced facilities for fusing geospatial data, including: clustering and batch handling of linked entities; link discovery-recommendation for neighborhoods of unlinked entities; and batch handling of fusion actions. Finally, an initial learning mechanism is incorporated into the tool that allows the recommendation of fusion actions and of OSM categories for annotation of the fused entities.

Fusing geometries in FAGI

As a first step on extending FAGI into a POI data fusion framework, we will adjust and enhance the existing fusion functionality (fusion actions, similarity functions and metrics, POI-specific dictionaries), so that they are optimized for the setting of fusing POI entities. Next, we will work towards automating the fusion process as much as possible, by creating configurable fusion rules and pipelines and by training POI data-specific machine learning models for fusion recommendation. Finally, we will incorporate processes for taking into account provenance and evolution of POI entities and metadata during the fusion process. Through all the above stages, quality assessment will be a major concern; quality metrics will be used not only to validate the fusion quality, but also to assist and guide the fusion process.

Our goal is to create a robust framework that will facilitate and guide non-expert users to produce POI data of higher quality, completeness and coverage, faster and with minimum effort.