Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach

Due to the rapid expansion of information and online sources, automatic keyphrase extraction remains an important and challenging problem in the field of current study. The use of keyphrases is extremely beneficial for many tasks, including information retrieval (IR) systems and natural language pro...

Full description

Saved in:
Bibliographic Details
Main Authors: Miah, Mohammad Badrul Alam, Suryanti, Awang, Rahman, Md Mustafizur, Sanwar Hosen, A. S. M.
Format: Article
Language:English
Published: The Science and Information (SAI) Organization Limited 2023
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/39116/1/Keyphrase%20distance%20analysis%20technique%20from%20news%20articles%20as%20a%20feature%20for%20keyphrase%20extraction.pdf
http://umpir.ump.edu.my/id/eprint/39116/
https://doi.org/10.14569/IJACSA.2023.01410104
https://doi.org/10.14569/IJACSA.2023.01410104
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang Al-Sultan Abdullah
Language: English
id my.ump.umpir.39116
record_format eprints
spelling my.ump.umpir.391162024-01-05T07:41:37Z http://umpir.ump.edu.my/id/eprint/39116/ Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach Miah, Mohammad Badrul Alam Suryanti, Awang Rahman, Md Mustafizur Sanwar Hosen, A. S. M. QA75 Electronic computers. Computer science Due to the rapid expansion of information and online sources, automatic keyphrase extraction remains an important and challenging problem in the field of current study. The use of keyphrases is extremely beneficial for many tasks, including information retrieval (IR) systems and natural language processing (NLP). It is essential to extract the features of those keyphrases for extracting the most significant keyphrases as well as summarizing the texts to the highest standard. In order to analyze the distance between keyphrases in news articles as a feature of keyphrases, this research proposed a region-based unsupervised keyphrase distance analysis (KDA) technique. The proposed method is broken down into eight steps: gathering data, data preprocessing, data processing, searching keyphrases, distance calculation, averaging distance, curve plotting, and lastly, the curve fitting technique. The proposed approach begins by gathering two distinct datasets containing the news items, which are then used in the data preprocessing step, which makes use of a few preprocessing techniques. This preprocessed data is then employed in the data processing phase, where it is routed to the keyphrase searching, distance computation, and distance averaging phases. Finally, the curve fitting method is used after applying a curve plotting analysis. These two benchmark datasets are then used to evaluate and test the performance of the proposed approach. The proposed approach is then contrasted with different approaches to show how effective, advantageous, and significant it is. The results of the evaluation also proved that the proposed technique considerably improved the efficiency of keyphrase extraction techniques. It produces an F1-score value of 96.91% whereas its present keyphrases are 94.55%. The Science and Information (SAI) Organization Limited 2023 Article PeerReviewed pdf en cc_by_4 http://umpir.ump.edu.my/id/eprint/39116/1/Keyphrase%20distance%20analysis%20technique%20from%20news%20articles%20as%20a%20feature%20for%20keyphrase%20extraction.pdf Miah, Mohammad Badrul Alam and Suryanti, Awang and Rahman, Md Mustafizur and Sanwar Hosen, A. S. M. (2023) Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach. International Journal of Advanced Computer Science and Applications (IJACSA), 14 (10). pp. 995-1002. ISSN 2156-5570(Online). (Published) https://doi.org/10.14569/IJACSA.2023.01410104 https://doi.org/10.14569/IJACSA.2023.01410104
institution Universiti Malaysia Pahang Al-Sultan Abdullah
building UMPSA Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Miah, Mohammad Badrul Alam
Suryanti, Awang
Rahman, Md Mustafizur
Sanwar Hosen, A. S. M.
Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach
description Due to the rapid expansion of information and online sources, automatic keyphrase extraction remains an important and challenging problem in the field of current study. The use of keyphrases is extremely beneficial for many tasks, including information retrieval (IR) systems and natural language processing (NLP). It is essential to extract the features of those keyphrases for extracting the most significant keyphrases as well as summarizing the texts to the highest standard. In order to analyze the distance between keyphrases in news articles as a feature of keyphrases, this research proposed a region-based unsupervised keyphrase distance analysis (KDA) technique. The proposed method is broken down into eight steps: gathering data, data preprocessing, data processing, searching keyphrases, distance calculation, averaging distance, curve plotting, and lastly, the curve fitting technique. The proposed approach begins by gathering two distinct datasets containing the news items, which are then used in the data preprocessing step, which makes use of a few preprocessing techniques. This preprocessed data is then employed in the data processing phase, where it is routed to the keyphrase searching, distance computation, and distance averaging phases. Finally, the curve fitting method is used after applying a curve plotting analysis. These two benchmark datasets are then used to evaluate and test the performance of the proposed approach. The proposed approach is then contrasted with different approaches to show how effective, advantageous, and significant it is. The results of the evaluation also proved that the proposed technique considerably improved the efficiency of keyphrase extraction techniques. It produces an F1-score value of 96.91% whereas its present keyphrases are 94.55%.
format Article
author Miah, Mohammad Badrul Alam
Suryanti, Awang
Rahman, Md Mustafizur
Sanwar Hosen, A. S. M.
author_facet Miah, Mohammad Badrul Alam
Suryanti, Awang
Rahman, Md Mustafizur
Sanwar Hosen, A. S. M.
author_sort Miah, Mohammad Badrul Alam
title Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach
title_short Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach
title_full Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach
title_fullStr Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach
title_full_unstemmed Keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: An unsupervised approach
title_sort keyphrase distance analysis technique from news articles as a feature for keyphrase extraction: an unsupervised approach
publisher The Science and Information (SAI) Organization Limited
publishDate 2023
url http://umpir.ump.edu.my/id/eprint/39116/1/Keyphrase%20distance%20analysis%20technique%20from%20news%20articles%20as%20a%20feature%20for%20keyphrase%20extraction.pdf
http://umpir.ump.edu.my/id/eprint/39116/
https://doi.org/10.14569/IJACSA.2023.01410104
https://doi.org/10.14569/IJACSA.2023.01410104
_version_ 1822924028331950080