Ensemble application of ELM and GPU for real-time multimodal sentiment analysis

The enormous number of videos posted everyday on multimedia websites such as Facebook and YouTube makes the Internet an infinite source of information. Collecting and processing such information, however, is a very challenging task as it involves dealing with a huge amount of information that is cha...

Full description

Saved in:
Bibliographic Details
Main Authors: Tran, Ha-Nguyen, Cambria, Erik
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/141742
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The enormous number of videos posted everyday on multimedia websites such as Facebook and YouTube makes the Internet an infinite source of information. Collecting and processing such information, however, is a very challenging task as it involves dealing with a huge amount of information that is changing at a very high speed. To this end, we leverage on the processing speed of extreme learning machine and graphics processing unit to overcome the limitations of standard learning algorithms and central processing unit (CPU) and, hence, perform real-time multimodal sentiment analysis, i.e., harvesting sentiments from web videos by taking into account audio, visual and textual modalities as sources of the information. For the sentiment classification, we leveraged on sentic memes, i.e., basic units of sentiment whose combination can potentially describe the full range of emotional experiences that are rooted in any of us, including different degrees of polarity. We used both feature and decision level fusion methods to fuse the information extracted from the different modalities. Using the sentiment annotated dataset generated from YouTube video reviews, our proposed multimodal system is shown to achieve an accuracy of 78%. In term of processing speed, our method shows improvements of several orders of magnitude for feature extraction compared to CPU-based counterparts.