Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches

Doctor of Philosophy (Computer Engineering), 2020

Saved in:
Bibliographic Details
Main Author: Khin Sandar Kyaw
Other Authors: Somchai Limsiroratana
Format: Theses and Dissertations
Language:English
Published: Prince of Songkla University 2023
Subjects:
Online Access:http://kb.psu.ac.th/psukb/handle/2016/19118
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Prince of Songkhla University
Language: English
id th-psu.2016-19118
record_format dspace
spelling th-psu.2016-191182023-12-04T02:24:46Z Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches Khin Sandar Kyaw Somchai Limsiroratana Faculty of Engineering Computer Engineering คณะวิศวกรรมศาสตร์ ภาควิชาวิศวกรรมคอมพิวเตอร์ Electronic publications Metaheuristics Heuristic algorithms Doctor of Philosophy (Computer Engineering), 2020 Nowadays, the culture for accessing news around the world is changed from paper to electronic format and the rate of publication for newspapers and magazines on website are increased dramatically. Meanwhile, text feature selection for the automatic document classification (ADC) is becoming a big challenge because of the unstructured nature of text feature, which is called “multi-dimension feature problem”. On the other hand, various powerful schemes dealing with text feature selection are being developed continuously nowadays, but there still exists a research gap for “optimization of feature selection problem (OFSP)”, which can be looked for the global optimal features. Meanwhile, the capacity of meta-heuristic intelligence for knowledge discovery process (KDP) is also become the critical role to overcome NP-hard problem of OFSP by providing effective performance and efficient computation time. Therefore, the idea of meta-heuristic based approach for optimization of feature selection is proposed in this research to search the global optimal features for ADC. In this thesis, case study of meta-heuristic intelligence and traditional approaches for feature selection optimization process in document classification is observed. It includes eleven meta-heuristic algorithms such as Ant Colony search, Artificial Bee Colony search, Bat search, Cuckoo search, Evolutionary search, Elephant search, Firefly search, Flower search, Genetic search, Rhinoceros search, and Wolf search, for searching the optimal feature subset for document classification. Then, the results of proposed model are compared with three traditional search algorithms like Best First search (BFS), Greedy Stepwise (GS), and Ranker search (RS). In addition, the framework of data mining is applied. It involves data preprocessing, feature engineering, building learning model and evaluating the performance of proposed meta-heuristic intelligence-based feature selection using various performance and computation complexity evaluation schemes. In data processing, tokenization, stop-words handling, stemming and lemmatizing, and normalization are applied. In feature engineering process, n-gram TF-IDF feature extraction is used for implementing feature vector and both filter and wrapper approach are applied for observing different cases. In addition, three different classifiers like J48, Naïve Bayes, and Support Vector Machine, are used for building the document classification model. According to the results, the proposed system can reduce the number of selected features dramatically that can deteriorate learning model performance. In addition, the selected global subset features can yield better performance than traditional search according to single objective function of proposed model. 2023-12-04T02:24:46Z 2023-12-04T02:24:46Z 2020 Thesis http://kb.psu.ac.th/psukb/handle/2016/19118 en Attribution-NonCommercial-NoDerivs 3.0 Thailand http://creativecommons.org/licenses/by-nc-nd/3.0/th/ application/pdf Prince of Songkla University
institution Prince of Songkhla University
building Khunying Long Athakravi Sunthorn Learning Resources Center
continent Asia
country Thailand
Thailand
content_provider Khunying Long Athakravi Sunthorn Learning Resources Center
collection PSU Knowledge Bank
language English
topic Electronic publications
Metaheuristics
Heuristic algorithms
spellingShingle Electronic publications
Metaheuristics
Heuristic algorithms
Khin Sandar Kyaw
Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
description Doctor of Philosophy (Computer Engineering), 2020
author2 Somchai Limsiroratana
author_facet Somchai Limsiroratana
Khin Sandar Kyaw
format Theses and Dissertations
author Khin Sandar Kyaw
author_sort Khin Sandar Kyaw
title Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
title_short Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
title_full Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
title_fullStr Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
title_full_unstemmed Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
title_sort feature selection for document classification : case study of meta-heuristic intelligence and traditional approaches
publisher Prince of Songkla University
publishDate 2023
url http://kb.psu.ac.th/psukb/handle/2016/19118
_version_ 1784859627422220288