A new text-based w-distance metric to find the perfect match between words

The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ali, M., Jung, L.T., Hosam, O., Wagan, A.A., Shah, R.A., Khayyat, M.
Format:	Article
Published:	IOS Press 2020
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b http://eprints.utp.edu.my/23450/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Petronas

id	my.utp.eprints.23450
record_format	eprints
spelling	my.utp.eprints.234502021-08-19T07:19:54Z A new text-based w-distance metric to find the perfect match between words Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. Â© 2020-IOS Press and the authors. All rights reserved. IOS Press 2020 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b Ali, M. and Jung, L.T. and Hosam, O. and Wagan, A.A. and Shah, R.A. and Khayyat, M. (2020) A new text-based w-distance metric to find the perfect match between words. Journal of Intelligent and Fuzzy Systems, 38 (3). pp. 2661-2672. http://eprints.utp.edu.my/23450/
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. Â© 2020-IOS Press and the authors. All rights reserved.
format	Article
author	Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M.
spellingShingle	Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. A new text-based w-distance metric to find the perfect match between words
author_facet	Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M.
author_sort	Ali, M.
title	A new text-based w-distance metric to find the perfect match between words
title_short	A new text-based w-distance metric to find the perfect match between words
title_full	A new text-based w-distance metric to find the perfect match between words
title_fullStr	A new text-based w-distance metric to find the perfect match between words
title_full_unstemmed	A new text-based w-distance metric to find the perfect match between words
title_sort	new text-based w-distance metric to find the perfect match between words
publisher	IOS Press
publishDate	2020
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b http://eprints.utp.edu.my/23450/
_version_	1738656473779535872

A new text-based w-distance metric to find the perfect match between words

Similar Items