A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text

© 2020, Springer Nature Switzerland AG. This research compares the effectiveness of using traditional bag-of-words and word-embedding attributes to classify movie comments into spoiler or non-spoiler. Both approaches were applied to comments in English, an inflectional language; and in Thai, a non-i...

Full description

Saved in:
Bibliographic Details
Main Author: Rangsipan Marukatat
Other Authors: Mahidol University
Format: Conference or Workshop Item
Published: 2020
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/49583
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
Description
Summary:© 2020, Springer Nature Switzerland AG. This research compares the effectiveness of using traditional bag-of-words and word-embedding attributes to classify movie comments into spoiler or non-spoiler. Both approaches were applied to comments in English, an inflectional language; and in Thai, a non-inflectional language. Experimental results suggested that in terms of classification performance, word embedding was not clearly better than bag of words. Yet, a decision to choose it over bag of words could be due to its scalability. Between Word2Vec and FastText embeddings, the former was favorable when few out-of-vocabulary (OOV) words were present. Finally, although FastText was expected to be helpful with a large number of OOV words, its benefit was hardly seen for Thai language.