A new text-based w-distance metric to find the perfect match between words

The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/s...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali, M., Jung, L.T., Hosam, O., Wagan, A.A., Shah, R.A., Khayyat, M.
Format: Article
Published: IOS Press 2020
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b
http://eprints.utp.edu.my/23450/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Petronas
id my.utp.eprints.23450
record_format eprints
spelling my.utp.eprints.234502021-08-19T07:19:54Z A new text-based w-distance metric to find the perfect match between words Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. © 2020-IOS Press and the authors. All rights reserved. IOS Press 2020 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b Ali, M. and Jung, L.T. and Hosam, O. and Wagan, A.A. and Shah, R.A. and Khayyat, M. (2020) A new text-based w-distance metric to find the perfect match between words. Journal of Intelligent and Fuzzy Systems, 38 (3). pp. 2661-2672. http://eprints.utp.edu.my/23450/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. © 2020-IOS Press and the authors. All rights reserved.
format Article
author Ali, M.
Jung, L.T.
Hosam, O.
Wagan, A.A.
Shah, R.A.
Khayyat, M.
spellingShingle Ali, M.
Jung, L.T.
Hosam, O.
Wagan, A.A.
Shah, R.A.
Khayyat, M.
A new text-based w-distance metric to find the perfect match between words
author_facet Ali, M.
Jung, L.T.
Hosam, O.
Wagan, A.A.
Shah, R.A.
Khayyat, M.
author_sort Ali, M.
title A new text-based w-distance metric to find the perfect match between words
title_short A new text-based w-distance metric to find the perfect match between words
title_full A new text-based w-distance metric to find the perfect match between words
title_fullStr A new text-based w-distance metric to find the perfect match between words
title_full_unstemmed A new text-based w-distance metric to find the perfect match between words
title_sort new text-based w-distance metric to find the perfect match between words
publisher IOS Press
publishDate 2020
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b
http://eprints.utp.edu.my/23450/
_version_ 1738656473779535872