A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles

Automated keyphrase extraction is crucial for extracting and summarizing relevant information from a variety of publications in multiple domains. However, the extraction of good-quality keyphrases and the summarising of information to a good standard have become extremely challenging in recent resea...

Full description

Saved in:
Bibliographic Details
Main Authors: Miah, Mohammad Badrul Alam, Suryanti, Awang, Rahman, Md Mustafizur, A. S. M., Sanwar Hosen, Ra, In-Ho
Format: Article
Language:English
Published: MDPI 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/35105/1/A%20New%20Unsupervised%20Technique%20to%20Analyze.pdf
http://umpir.ump.edu.my/id/eprint/35105/
https://doi.org/10.3390/electronics11172773
https://doi.org/10.3390/electronics11172773
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.35105
record_format eprints
spelling my.ump.umpir.351052022-09-06T08:14:42Z http://umpir.ump.edu.my/id/eprint/35105/ A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles Miah, Mohammad Badrul Alam Suryanti, Awang Rahman, Md Mustafizur A. S. M., Sanwar Hosen Ra, In-Ho QA75 Electronic computers. Computer science TA Engineering (General). Civil engineering (General) Automated keyphrase extraction is crucial for extracting and summarizing relevant information from a variety of publications in multiple domains. However, the extraction of good-quality keyphrases and the summarising of information to a good standard have become extremely challenging in recent research because of the advancement of technology and the exponential development of digital sources and textual information. Because of this, the usage of keyphrase features for keyphrase extraction techniques has recently gained tremendous popularity. This paper proposed a new unsupervised region-based keyphrase centroid and frequency analysis technique, named the KCFA technique, for keyphrase extraction as a feature. Data/datasets collection, data pre-processing, statistical methodologies, curve plotting analysis, and curve fitting technique are the five main processes in the proposed technique. To begin, the technique collects multiple datasets from diverse sources, which are then input into the data pre-processing step by utilizing some text pre-processing processes. Afterward, the region-based statistical methodologies receive the pre-processed data, followed by the curve plotting examination and, lastly, the curve fitting technique. The proposed technique is then tested and evaluated using ten (10) best-accessible benchmark datasets from various disciplines. The proposed approach is then compared to our available methods to demonstrate its efficacy, advantages, and importance. Lastly, the results of the experiment show that the proposed method works well to analyze the centroid and frequency of keyphrases from academic articles. It provides a centroid of 706.66 and a frequency of 38.95% in the first region, 2454.21 and 7.98% in the second region, for a total frequency of 68.11% MDPI 2022 Article PeerReviewed pdf en cc_by_4 http://umpir.ump.edu.my/id/eprint/35105/1/A%20New%20Unsupervised%20Technique%20to%20Analyze.pdf Miah, Mohammad Badrul Alam and Suryanti, Awang and Rahman, Md Mustafizur and A. S. M., Sanwar Hosen and Ra, In-Ho (2022) A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles. Electronics, 11 (17). pp. 1-20. ISSN 2079-9292 https://doi.org/10.3390/electronics11172773 https://doi.org/10.3390/electronics11172773
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA75 Electronic computers. Computer science
TA Engineering (General). Civil engineering (General)
spellingShingle QA75 Electronic computers. Computer science
TA Engineering (General). Civil engineering (General)
Miah, Mohammad Badrul Alam
Suryanti, Awang
Rahman, Md Mustafizur
A. S. M., Sanwar Hosen
Ra, In-Ho
A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
description Automated keyphrase extraction is crucial for extracting and summarizing relevant information from a variety of publications in multiple domains. However, the extraction of good-quality keyphrases and the summarising of information to a good standard have become extremely challenging in recent research because of the advancement of technology and the exponential development of digital sources and textual information. Because of this, the usage of keyphrase features for keyphrase extraction techniques has recently gained tremendous popularity. This paper proposed a new unsupervised region-based keyphrase centroid and frequency analysis technique, named the KCFA technique, for keyphrase extraction as a feature. Data/datasets collection, data pre-processing, statistical methodologies, curve plotting analysis, and curve fitting technique are the five main processes in the proposed technique. To begin, the technique collects multiple datasets from diverse sources, which are then input into the data pre-processing step by utilizing some text pre-processing processes. Afterward, the region-based statistical methodologies receive the pre-processed data, followed by the curve plotting examination and, lastly, the curve fitting technique. The proposed technique is then tested and evaluated using ten (10) best-accessible benchmark datasets from various disciplines. The proposed approach is then compared to our available methods to demonstrate its efficacy, advantages, and importance. Lastly, the results of the experiment show that the proposed method works well to analyze the centroid and frequency of keyphrases from academic articles. It provides a centroid of 706.66 and a frequency of 38.95% in the first region, 2454.21 and 7.98% in the second region, for a total frequency of 68.11%
format Article
author Miah, Mohammad Badrul Alam
Suryanti, Awang
Rahman, Md Mustafizur
A. S. M., Sanwar Hosen
Ra, In-Ho
author_facet Miah, Mohammad Badrul Alam
Suryanti, Awang
Rahman, Md Mustafizur
A. S. M., Sanwar Hosen
Ra, In-Ho
author_sort Miah, Mohammad Badrul Alam
title A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
title_short A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
title_full A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
title_fullStr A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
title_full_unstemmed A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles
title_sort new unsupervised technique to analyze the centroid and frequency of keyphrases from academic articles
publisher MDPI
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/35105/1/A%20New%20Unsupervised%20Technique%20to%20Analyze.pdf
http://umpir.ump.edu.my/id/eprint/35105/
https://doi.org/10.3390/electronics11172773
https://doi.org/10.3390/electronics11172773
_version_ 1744353878554443776