Theme identification using machine learning techniques

With the abundance of online research platforms, much information presented in PDF files, such as articles and journals, can be obtained easily. In this case, students completing research projects would have many downloaded PDF articles on their laptops. However, identifying the target article...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jayady, Siti Hajar, Antong, Hasmawati
Format:	Article
Language:	English
Published:	ASASI 2021
Subjects:	TK7885 Computer engineering
Online Access:	http://irep.iium.edu.my/104247/2/104247_Theme%20identification.pdf http://irep.iium.edu.my/104247/ https://asasijournal.id/index.php/jiae/article/view/24 https://doi.org/10.51662/jiae.v1i2.24
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Islam Antarabangsa Malaysia
Language:	English

Description
Summary:	With the abundance of online research platforms, much information presented in PDF files, such as articles and journals, can be obtained easily. In this case, students completing research projects would have many downloaded PDF articles on their laptops. However, identifying the target articles manually within the collection can be tiring as most articles consist of several pages that need to be analyzed. Reading each article to determine if the article relates theme and organizing the articles based on themes is time and energy-consuming. Referring to this problem, a PDF files organizer that implemented a theme identifier is necessary. Thus, work will focus on automatic text classification using the machine learning methods to build a theme identifier employed in the PDF files organizer to classify articles into augmented reality and machine learning. A total of 1000 text documents for both themes were used to build the classification model. Moreover, the pre-preprocessing step for data cleaning and TF-IDF feature extraction for text vectorization and to reduce sparse vectors were performed. 80% of the dataset were used for training, and the remaining were used to validate the trained models. The classification models proposed in this work are Linear SVM and Multinomial Naïve Bayes. The accuracy of the models was evaluated using a confusion matrix. For the Linear SVM model, grid-search optimization was performed to determine the optimal value of the Cost parameter.

Theme identification using machine learning techniques

Similar Items