Region-Based Distance Analysis of Keyphrases: A New Unsupervised Method for Extracting Keyphrases Feature from Articles

Due to the exponential growth of information’s and web sources, Automatic keyphrase extraction is still a challenging issue in the current research area. Keyphrases are very helpful for several tasks in natural language processing (NLP) and information retrieval (IR) systems. Feature extractions for...

Full description

Saved in:
Bibliographic Details
Main Authors: Miah, Mohammad Badrul Alam, Suryanti, Awang, Azad, Md. Saiful
Format: Conference or Workshop Item
Language:English
Published: IEEE 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/33128/7/Region-Based%20Distance%20Analysis%20of%20Keyphrases1.pdf
http://umpir.ump.edu.my/id/eprint/33128/
https://doi.org/10.1109/ICSECS52883.2021.00030
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
Description
Summary:Due to the exponential growth of information’s and web sources, Automatic keyphrase extraction is still a challenging issue in the current research area. Keyphrases are very helpful for several tasks in natural language processing (NLP) and information retrieval (IR) systems. Feature extractions for those keyphrases execute a vital role in extracting the top-quality keyphrases and summarising the documents at a superior level. This paper proposes a new region-based distance analysis of keyphrases (RDAK) unsupervised technique for feature extraction of keyphrases from articles. The proposed method comprises six phases: data acquisition and preprocessing, data processing, distance calculation, average distance, curve plotting, and curve fitting. At first, the system inputs the collected different datasets to the preprocessing step by employing some text preprocessing techniques. Afterwards, the preprocessed data is applied to the data processing phase, and then after distance calculation, it is passed to the region-based average calculation process, then curve plotting analysis, and afterwards, the curve fitting technique is utilized. Finally, the proposed system has tested and evaluated the performance through implementing them on benchmark datasets. The proposed system will significantly improve the performance of existing keyphrase extraction techniques.