Multi modal video analysis with LLM for descriptive emotion and expression annotation

Multi modal video analysis with LLM for descriptive emotion and expression annotation

This project presents a novel approach to multi-modal emotion and action annotation by integrating facial expression recognition, action recognition, and audio-based emotion analysis into a unified framework. The system utilizes TimesFormer, OpenFace, and SpeechBrain to extract relevant features fro...

Full description

Saved in:

Bibliographic Details
Main Author:	Fan, Yupei
Other Authors:	Zheng Jianmin
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Video understanding Large language model (LLM) Multimodal analysis Feature extraction Deep learning Emotion annotation
Online Access:	https://hdl.handle.net/10356/180715
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Enhancing rumour classification with target-based dual emotion using LLM
by: Li, Yuanhang
Published: (2024)

VISUAL CAUSAL INFERENCE
by: YICONG LI
Published: (2024)

GEO-REFERENCED VIDEO RETRIEVAL: TEXT ANNOTATION AND SIMILARITY SEARCH
by: YIN YIFANG
Published: (2016)

RELATION UNDERSTANDING IN VIDEOS
by: SHANG XINDI
Published: (2021)

In-video product annotation with web information mining
by: Li, G., et al.
Published: (2013)

Large language model (LLM) with retrieve-augmented generation (RAG) for legal case research
by: Liu, Zihao
Published: (2024)

Annotating Objects and Relations in User-Generated Videos
by: Xindi Shang, et al.
Published: (2020)

Beyond distance measurement: Constructing neighborhood similarity for video annotation
by: Wang, M., et al.
Published: (2013)

Fusing pairwise modalities for emotion recognition in conversations
by: Fan, Chunxiao, et al.
Published: (2024)

LLM-based column lineage for relational databases
by: Tan, Yu Ling
Published: (2024)

Interactive state-transition diagrams for visualization of multimodal annotation
by: Podlasov, A., et al.
Published: (2014)

Correlative linear neighborhood propagation for video annotation
by: Tang, J., et al.
Published: (2013)

On the annotation of web videos by efficient near-duplicate search
by: ZHAO, Wan-Lei, et al.
Published: (2010)

Framework to evaluate and test defences against hallucination in large language model
by: Pan, Johnny Shi Han
Published: (2024)

SMU launches LLM (Master of Laws) Programme
by: Singapore Management University
Published: (2011)

Multimodal learning with deep Boltzmann Machine for emotion prediction in user generated videos
by: PANG, Lei, et al.
Published: (2015)

CeleLabel: An interactive system for annotating celebrities in web videos
by: CHEN, Zhineng, et al.
Published: (2014)

Fast semantic diffusion for large-scale context-based image and video annotation
by: JIANG, Yu-Gang, et al.
Published: (2012)

TRANSFORMER TECHNIQUES FOR HUMAN ACTION RECOGNITION AND LOCALIZATION
by: CHANG SHUNING
Published: (2024)

A new gradient based character segmentation method for video text recognition
by: Shivakumara, P., et al.
Published: (2013)

Large-scale sensor-rich video management and delivery
by: SHEN ZHIJIE
Published: (2013)

Cross-modal credibility modelling for EEG-based multimodal emotion recognition
by: Zhang, Yuzhe, et al.
Published: (2024)

Fusion of multimodal embeddings for ad-hoc video search
by: FRANCIS, Danny, et al.
Published: (2019)

Don't ask me what I'm like, just watch and listen
by: Srivastava, R., et al.
Published: (2013)

Towards semantic, debiased and moment video retrieval
by: Satar, Burak
Published: (2025)

Annotation for free: Video tagging by mining user search behavior
by: TING, Yao, et al.
Published: (2013)

Efficient cross-modal video retrieval with meta-optimized frames
by: HAN, Ning, et al.
Published: (2024)

Selective annotation via data allocation: These data should be triaged to experts for annotation rather than the model
by: HUANG, Chen, et al.
Published: (2024)

A preliminary annotated bibliography of Pilipino linguistics, 1604-1976
by: Makarenko, Vladimir A., et al.
Published: (1981)

Towards efficient sparse coding for scalable image annotation
by: Huang, J., et al.
Published: (2014)

One person labels one million images
by: Tang, J., et al.
Published: (2013)

The annotated lexicon of chinese emotion words
by: Ng, Bee Chin, et al.
Published: (2020)

Annotating web images using NOVA: NOn-conVex group spArsity
by: Wu, F., et al.
Published: (2014)

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
by: LI, Bobo, et al.
Published: (2023)

Reinforcement learning-based interactive video search
by: MA, Zhixin, et al.
Published: (2022)

Real-time human action recognition by luminance field trajectory analysis
by: Li, Z., et al.
Published: (2014)

DOES ATTEMPTING PRACTICE QUESTIONS DEVELOP CONCEPTUAL UNDERSTANDING? A PRELIMINARY INVESTIGATION
by: TAN ZU WEI, JOEL
Published: (2023)

Multimodal distillation for egocentric video understanding
by: Peng, Han
Published: (2024)

Unified generative and discriminative training for multi-modal Large Language Models
by: CHOW, Wei, et al.
Published: (2024)

VrdONE : One-stage video visual relation detection
by: JIANG, Xinjie, et al.
Published: (2024)