A new text-based w-distance metric to find the perfect match between words
The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/s...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Published: |
IOS Press
2020
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b http://eprints.utp.edu.my/23450/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Petronas |
id |
my.utp.eprints.23450 |
---|---|
record_format |
eprints |
spelling |
my.utp.eprints.234502021-08-19T07:19:54Z A new text-based w-distance metric to find the perfect match between words Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. © 2020-IOS Press and the authors. All rights reserved. IOS Press 2020 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b Ali, M. and Jung, L.T. and Hosam, O. and Wagan, A.A. and Shah, R.A. and Khayyat, M. (2020) A new text-based w-distance metric to find the perfect match between words. Journal of Intelligent and Fuzzy Systems, 38 (3). pp. 2661-2672. http://eprints.utp.edu.my/23450/ |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Institutional Repository |
url_provider |
http://eprints.utp.edu.my/ |
description |
The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of 'Employee Name', a one-word string of 'Name' or more than one word such as, 'Name of Employee'. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity. © 2020-IOS Press and the authors. All rights reserved. |
format |
Article |
author |
Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. |
spellingShingle |
Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. A new text-based w-distance metric to find the perfect match between words |
author_facet |
Ali, M. Jung, L.T. Hosam, O. Wagan, A.A. Shah, R.A. Khayyat, M. |
author_sort |
Ali, M. |
title |
A new text-based w-distance metric to find the perfect match between words |
title_short |
A new text-based w-distance metric to find the perfect match between words |
title_full |
A new text-based w-distance metric to find the perfect match between words |
title_fullStr |
A new text-based w-distance metric to find the perfect match between words |
title_full_unstemmed |
A new text-based w-distance metric to find the perfect match between words |
title_sort |
new text-based w-distance metric to find the perfect match between words |
publisher |
IOS Press |
publishDate |
2020 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85081537860&doi=10.3233%2fJIFS-179552&partnerID=40&md5=2f8ba723a873404c83134a9f0365fe1b http://eprints.utp.edu.my/23450/ |
_version_ |
1738656473779535872 |