Text classification using topic modelling and machine learning

This report presents a study that has been centered on topic modelling and text classification through the development and evaluation of a self-developed Latent Dirichlet Allocation model. In this project, we leveraged machine learning techniques to evaluate the effect of incorporating various prior...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Xinyu
Other Authors: S Supraja
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176723
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This report presents a study that has been centered on topic modelling and text classification through the development and evaluation of a self-developed Latent Dirichlet Allocation model. In this project, we leveraged machine learning techniques to evaluate the effect of incorporating various prior types within the developed LDA model. The experiments to evaluate the performance of the developed model were conducted across three benchmark datasets: 20 Newsgroups, Neural Network Patent Query, and New York Times News Articles, and its performance was assessed based on classification reports generated by Support Vector Machines (SVMs), Extreme Learning Machines (ELM), and Gaussian Processes (GP) classifiers. The classification results demonstrate a clear correlation between the choice of alpha and beta prior types and the quality of topics modelled. The results highlight the potential of the custom prior settings to enhance both topic discovery and classification effectiveness. This study contributes to the domains of topic modelling and text classification, illustrating the practical applicability of advanced topic modeling techniques for enhancing text classification results, and setting the stage for future research into the optimization of topic models for diverse analytical tasks.