A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational re...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/178350 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-178350 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1783502024-06-12T07:45:52Z A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants Eibeck, Andreas Zhang, Shaocong Lim, Mei Qi Kraft, Markus School of Chemical and Biomedical Engineering Cambridge Centre for Advanced Research and Education in Singapore Engineering Semantic web Linked data Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational results for such scenarios require the availability of complete, high-quality data. This work focuses on instance matching — one of the subtasks of automatically populating the knowledge graph with data from a wide spectrum of external sources. Instance matching compares two data sets and seeks to identify instances (data, records) referring to the same real-world entity. We introduce AutoCal, a new instance matcher which does not require labelled data and runs out of the box for a wide range of domains without tuning method-specific parameters. AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of Machine Learning. We also select an unsupervised state-of-the-art matcher from the field of Deep Learning for a thorough comparison. Our results show that neither AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster. In summary, AutoCal is specifically well-suited to be used in an automated environment. We present its prototypical integration into the World Avatar and apply AutoCal to the domain of power plants which is relevant for practical environmental scenarios of the World Avatar. National Research Foundation (NRF) Published version This project is funded by the National Research Foundation (NRF), Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme. Part of this work was supported by Towards Turing 2.0 under the EPSRC Grant EP/W037211/1 & The Alan Turing Institute. 2024-06-12T07:45:51Z 2024-06-12T07:45:51Z 2024 Journal Article Eibeck, A., Zhang, S., Lim, M. Q. & Kraft, M. (2024). A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants. Journal of Web Semantics, 80, 100815-. https://dx.doi.org/10.1016/j.websem.2024.100815 1570-8268 https://hdl.handle.net/10356/178350 10.1016/j.websem.2024.100815 2-s2.0-85186116420 80 100815 en CREATE Journal of Web Semantics © 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Semantic web Linked data |
spellingShingle |
Engineering Semantic web Linked data Eibeck, Andreas Zhang, Shaocong Lim, Mei Qi Kraft, Markus A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
description |
Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational results for such scenarios require the availability of complete, high-quality data. This work focuses on instance matching — one of the subtasks of automatically populating the knowledge graph with data from a wide spectrum of external sources. Instance matching compares two data sets and seeks to identify instances (data, records) referring to the same real-world entity. We introduce AutoCal, a new instance matcher which does not require labelled data and runs out of the box for a wide range of domains without tuning method-specific parameters. AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of Machine Learning. We also select an unsupervised state-of-the-art matcher from the field of Deep Learning for a thorough comparison. Our results show that neither AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster. In summary, AutoCal is specifically well-suited to be used in an automated environment. We present its prototypical integration into the World Avatar and apply AutoCal to the domain of power plants which is relevant for practical environmental scenarios of the World Avatar. |
author2 |
School of Chemical and Biomedical Engineering |
author_facet |
School of Chemical and Biomedical Engineering Eibeck, Andreas Zhang, Shaocong Lim, Mei Qi Kraft, Markus |
format |
Article |
author |
Eibeck, Andreas Zhang, Shaocong Lim, Mei Qi Kraft, Markus |
author_sort |
Eibeck, Andreas |
title |
A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
title_short |
A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
title_full |
A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
title_fullStr |
A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
title_full_unstemmed |
A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
title_sort |
simple and efficient approach to unsupervised instance matching and its application to linked data of power plants |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/178350 |
_version_ |
1806059812394369024 |