A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis

Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional...

全面介紹

Saved in:
書目詳細資料
Main Authors: KANG, Zhangguang, NAH, Fiona Fui-hoon, SIAU, Keng
格式: text
語言:English
出版: Institutional Knowledge at Singapore Management University 2024
主題:
在線閱讀:https://ink.library.smu.edu.sg/sis_research/9962
https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Singapore Management University
語言: English
實物特徵
總結:Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features.