A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis

Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional...

Full description

Saved in:
Bibliographic Details
Main Authors: KANG, Zhangguang, NAH, Fiona Fui-hoon, SIAU, Keng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9962
https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10962
record_format dspace
spelling sg-smu-ink.sis_research-109622025-01-16T10:09:15Z A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis KANG, Zhangguang NAH, Fiona Fui-hoon SIAU, Keng Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features. 2024-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9962 info:doi/10.1007/978-3-031-76821-7_6 https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computational Video Aesthetic Multimodal Analysis Neural Network Design Science Databases and Information Systems Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Computational Video Aesthetic
Multimodal Analysis
Neural Network
Design Science
Databases and Information Systems
Graphics and Human Computer Interfaces
spellingShingle Computational Video Aesthetic
Multimodal Analysis
Neural Network
Design Science
Databases and Information Systems
Graphics and Human Computer Interfaces
KANG, Zhangguang
NAH, Fiona Fui-hoon
SIAU, Keng
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
description Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features.
format text
author KANG, Zhangguang
NAH, Fiona Fui-hoon
SIAU, Keng
author_facet KANG, Zhangguang
NAH, Fiona Fui-hoon
SIAU, Keng
author_sort KANG, Zhangguang
title A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
title_short A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
title_full A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
title_fullStr A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
title_full_unstemmed A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
title_sort computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9962
https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf
_version_ 1821833220551868416