KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction

Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of...

Full description

Saved in:
Bibliographic Details
Main Authors: Alam Miah, Mohammad Badrul, Suryanti, Awang
Format: Conference or Workshop Item
Language:English
Published: 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf
http://umpir.ump.edu.my/id/eprint/36844/
https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang Al-Sultan Abdullah
Language: English
id my.ump.umpir.36844
record_format eprints
spelling my.ump.umpir.368442024-01-04T01:25:04Z http://umpir.ump.edu.my/id/eprint/36844/ KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction Alam Miah, Mohammad Badrul Suryanti, Awang QA75 Electronic computers. Computer science QA76 Computer software T Technology (General) TA Engineering (General). Civil engineering (General) Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods. 2022-11-15 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf Alam Miah, Mohammad Badrul and Suryanti, Awang (2022) KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction. In: The 6th National Conference for Postgraduate Research (NCON-PGR 2022) , 15 November 2022 , Virtual Conference, Universiti Malaysia Pahang, Malaysia. p. 83.. https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files
institution Universiti Malaysia Pahang Al-Sultan Abdullah
building UMPSA Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
TA Engineering (General). Civil engineering (General)
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
TA Engineering (General). Civil engineering (General)
Alam Miah, Mohammad Badrul
Suryanti, Awang
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
description Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods.
format Conference or Workshop Item
author Alam Miah, Mohammad Badrul
Suryanti, Awang
author_facet Alam Miah, Mohammad Badrul
Suryanti, Awang
author_sort Alam Miah, Mohammad Badrul
title KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_short KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_full KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_fullStr KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_full_unstemmed KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
title_sort kda: an unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf
http://umpir.ump.edu.my/id/eprint/36844/
https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files
_version_ 1822924020239040512