Text classification using topic modelling and machine learning

This report presents a study that has been centered on topic modelling and text classification through the development and evaluation of a self-developed Latent Dirichlet Allocation model. In this project, we leveraged machine learning techniques to evaluate the effect of incorporating various prior...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Li, Xinyu
مؤلفون آخرون: S Supraja
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2024
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/176723
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:This report presents a study that has been centered on topic modelling and text classification through the development and evaluation of a self-developed Latent Dirichlet Allocation model. In this project, we leveraged machine learning techniques to evaluate the effect of incorporating various prior types within the developed LDA model. The experiments to evaluate the performance of the developed model were conducted across three benchmark datasets: 20 Newsgroups, Neural Network Patent Query, and New York Times News Articles, and its performance was assessed based on classification reports generated by Support Vector Machines (SVMs), Extreme Learning Machines (ELM), and Gaussian Processes (GP) classifiers. The classification results demonstrate a clear correlation between the choice of alpha and beta prior types and the quality of topics modelled. The results highlight the potential of the custom prior settings to enhance both topic discovery and classification effectiveness. This study contributes to the domains of topic modelling and text classification, illustrating the practical applicability of advanced topic modeling techniques for enhancing text classification results, and setting the stage for future research into the optimization of topic models for diverse analytical tasks.