Text classification using topic modelling and machine learning
This report presents a study that has been centered on topic modelling and text classification through the development and evaluation of a self-developed Latent Dirichlet Allocation model. In this project, we leveraged machine learning techniques to evaluate the effect of incorporating various prior...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/176723 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This report presents a study that has been centered on topic modelling and text classification through the development and evaluation of a self-developed Latent Dirichlet Allocation model. In this project, we leveraged machine learning techniques to evaluate the effect of incorporating various prior types within the developed LDA model.
The experiments to evaluate the performance of the developed model were conducted across three benchmark datasets: 20 Newsgroups, Neural Network Patent Query, and New York Times News Articles, and its performance was assessed based on classification reports generated by Support Vector Machines (SVMs), Extreme Learning Machines (ELM), and Gaussian Processes (GP) classifiers.
The classification results demonstrate a clear correlation between the choice of alpha and beta prior types and the quality of topics modelled. The results highlight the potential of the custom prior settings to enhance both topic discovery and classification effectiveness. This study contributes to the domains of topic modelling and text classification, illustrating the practical applicability of advanced topic modeling techniques for enhancing text classification results, and setting the stage for future research into the optimization of topic models for diverse analytical tasks. |
---|