A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants

Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational re...

Full description

Saved in:
Bibliographic Details
Main Authors: Eibeck, Andreas, Zhang, Shaocong, Lim, Mei Qi, Kraft, Markus
Other Authors: School of Chemical and Biomedical Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/178350
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-178350
record_format dspace
spelling sg-ntu-dr.10356-1783502024-06-12T07:45:52Z A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants Eibeck, Andreas Zhang, Shaocong Lim, Mei Qi Kraft, Markus School of Chemical and Biomedical Engineering Cambridge Centre for Advanced Research and Education in Singapore Engineering Semantic web Linked data Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational results for such scenarios require the availability of complete, high-quality data. This work focuses on instance matching — one of the subtasks of automatically populating the knowledge graph with data from a wide spectrum of external sources. Instance matching compares two data sets and seeks to identify instances (data, records) referring to the same real-world entity. We introduce AutoCal, a new instance matcher which does not require labelled data and runs out of the box for a wide range of domains without tuning method-specific parameters. AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of Machine Learning. We also select an unsupervised state-of-the-art matcher from the field of Deep Learning for a thorough comparison. Our results show that neither AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster. In summary, AutoCal is specifically well-suited to be used in an automated environment. We present its prototypical integration into the World Avatar and apply AutoCal to the domain of power plants which is relevant for practical environmental scenarios of the World Avatar. National Research Foundation (NRF) Published version This project is funded by the National Research Foundation (NRF), Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme. Part of this work was supported by Towards Turing 2.0 under the EPSRC Grant EP/W037211/1 & The Alan Turing Institute. 2024-06-12T07:45:51Z 2024-06-12T07:45:51Z 2024 Journal Article Eibeck, A., Zhang, S., Lim, M. Q. & Kraft, M. (2024). A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants. Journal of Web Semantics, 80, 100815-. https://dx.doi.org/10.1016/j.websem.2024.100815 1570-8268 https://hdl.handle.net/10356/178350 10.1016/j.websem.2024.100815 2-s2.0-85186116420 80 100815 en CREATE Journal of Web Semantics © 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Semantic web
Linked data
spellingShingle Engineering
Semantic web
Linked data
Eibeck, Andreas
Zhang, Shaocong
Lim, Mei Qi
Kraft, Markus
A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
description Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational results for such scenarios require the availability of complete, high-quality data. This work focuses on instance matching — one of the subtasks of automatically populating the knowledge graph with data from a wide spectrum of external sources. Instance matching compares two data sets and seeks to identify instances (data, records) referring to the same real-world entity. We introduce AutoCal, a new instance matcher which does not require labelled data and runs out of the box for a wide range of domains without tuning method-specific parameters. AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of Machine Learning. We also select an unsupervised state-of-the-art matcher from the field of Deep Learning for a thorough comparison. Our results show that neither AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster. In summary, AutoCal is specifically well-suited to be used in an automated environment. We present its prototypical integration into the World Avatar and apply AutoCal to the domain of power plants which is relevant for practical environmental scenarios of the World Avatar.
author2 School of Chemical and Biomedical Engineering
author_facet School of Chemical and Biomedical Engineering
Eibeck, Andreas
Zhang, Shaocong
Lim, Mei Qi
Kraft, Markus
format Article
author Eibeck, Andreas
Zhang, Shaocong
Lim, Mei Qi
Kraft, Markus
author_sort Eibeck, Andreas
title A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
title_short A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
title_full A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
title_fullStr A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
title_full_unstemmed A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
title_sort simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
publishDate 2024
url https://hdl.handle.net/10356/178350
_version_ 1806059812394369024