A tree based keyphrase extraction technique for academic literature

Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computati...

Full description

Saved in:
Bibliographic Details
Main Author: Rabby, Gollam
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/32740/1/A%20tree%20based%20keyphrase%20extraction%20technique%20for%20academic%20literature.wm.pdf
http://umpir.ump.edu.my/id/eprint/32740/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.32740
record_format eprints
spelling my.ump.umpir.327402023-01-27T02:28:31Z http://umpir.ump.edu.my/id/eprint/32740/ A tree based keyphrase extraction technique for academic literature Rabby, Gollam QA75 Electronic computers. Computer science Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, this thesis proposes a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent, employs limited statistical knowledge, and requires no train data. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final keyphrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with total 244 train and test articles, and compared with other relevant unsupervised techniques by taking the representatives from both statistical (such as Term Frequency-Inverse Document Frequency and YAKE) and graph-based techniques (PositionRank, CollabRank (SingleRank), TopicRank, and MultipartiteRank) into account. Three evaluation metrics, namely precision, recall and F1 score are taken into consideration during the experiments. The obtained results demonstrate the improved performance of the proposed technique over other similar techniques in terms of precision, recall, and F1 scores. 2019-08 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/32740/1/A%20tree%20based%20keyphrase%20extraction%20technique%20for%20academic%20literature.wm.pdf Rabby, Gollam (2019) A tree based keyphrase extraction technique for academic literature. Masters thesis, Universiti Malaysia Pahang.
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Rabby, Gollam
A tree based keyphrase extraction technique for academic literature
description Automatic keyphrase extraction techniques aim to extract quality keyphrases to summarize a document at a higher level. Among the existing techniques some of them are domain-specific and require application domain knowledge, some of them are based on higher-order statistical methods and are computationally expensive, and some of them require large train data which are rare for many applications. Overcoming these issues, this thesis proposes a new unsupervised automatic keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, which is domain-independent, employs limited statistical knowledge, and requires no train data. The proposed technique also introduces a new variant of the binary tree, called KeyPhrase Extraction (KePhEx) tree to extract final keyphrases from candidate keyphrases. Depending on the candidate keyphrases the KePhEx tree structure is either expanded or shrunk or maintained. In addition, a measure, called Cohesiveness Index or CI, is derived that denotes the degree of cohesiveness of a given node with respect to the root which is used in extracting final keyphrases from a resultant tree in a flexible manner and is utilized in ranking keyphrases alongside Term Frequency. The effectiveness of the proposed technique is evaluated using an experimental evaluation on a benchmark corpus, called SemEval-2010 with total 244 train and test articles, and compared with other relevant unsupervised techniques by taking the representatives from both statistical (such as Term Frequency-Inverse Document Frequency and YAKE) and graph-based techniques (PositionRank, CollabRank (SingleRank), TopicRank, and MultipartiteRank) into account. Three evaluation metrics, namely precision, recall and F1 score are taken into consideration during the experiments. The obtained results demonstrate the improved performance of the proposed technique over other similar techniques in terms of precision, recall, and F1 scores.
format Thesis
author Rabby, Gollam
author_facet Rabby, Gollam
author_sort Rabby, Gollam
title A tree based keyphrase extraction technique for academic literature
title_short A tree based keyphrase extraction technique for academic literature
title_full A tree based keyphrase extraction technique for academic literature
title_fullStr A tree based keyphrase extraction technique for academic literature
title_full_unstemmed A tree based keyphrase extraction technique for academic literature
title_sort tree based keyphrase extraction technique for academic literature
publishDate 2019
url http://umpir.ump.edu.my/id/eprint/32740/1/A%20tree%20based%20keyphrase%20extraction%20technique%20for%20academic%20literature.wm.pdf
http://umpir.ump.edu.my/id/eprint/32740/
_version_ 1756684410652983296