KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction
Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf http://umpir.ump.edu.my/id/eprint/36844/ https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Pahang Al-Sultan Abdullah |
Language: | English |
id |
my.ump.umpir.36844 |
---|---|
record_format |
eprints |
spelling |
my.ump.umpir.368442024-01-04T01:25:04Z http://umpir.ump.edu.my/id/eprint/36844/ KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction Alam Miah, Mohammad Badrul Suryanti, Awang QA75 Electronic computers. Computer science QA76 Computer software T Technology (General) TA Engineering (General). Civil engineering (General) Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods. 2022-11-15 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf Alam Miah, Mohammad Badrul and Suryanti, Awang (2022) KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction. In: The 6th National Conference for Postgraduate Research (NCON-PGR 2022) , 15 November 2022 , Virtual Conference, Universiti Malaysia Pahang, Malaysia. p. 83.. https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files |
institution |
Universiti Malaysia Pahang Al-Sultan Abdullah |
building |
UMPSA Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Pahang Al-Sultan Abdullah |
content_source |
UMPSA Institutional Repository |
url_provider |
http://umpir.ump.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science QA76 Computer software T Technology (General) TA Engineering (General). Civil engineering (General) |
spellingShingle |
QA75 Electronic computers. Computer science QA76 Computer software T Technology (General) TA Engineering (General). Civil engineering (General) Alam Miah, Mohammad Badrul Suryanti, Awang KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
description |
Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods. |
format |
Conference or Workshop Item |
author |
Alam Miah, Mohammad Badrul Suryanti, Awang |
author_facet |
Alam Miah, Mohammad Badrul Suryanti, Awang |
author_sort |
Alam Miah, Mohammad Badrul |
title |
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
title_short |
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
title_full |
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
title_fullStr |
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
title_full_unstemmed |
KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
title_sort |
kda: an unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction |
publishDate |
2022 |
url |
http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf http://umpir.ump.edu.my/id/eprint/36844/ https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files |
_version_ |
1822924020239040512 |