A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis
Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9962 https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10962 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-109622025-01-16T10:09:15Z A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis KANG, Zhangguang NAH, Fiona Fui-hoon SIAU, Keng Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features. 2024-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9962 info:doi/10.1007/978-3-031-76821-7_6 https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computational Video Aesthetic Multimodal Analysis Neural Network Design Science Databases and Information Systems Graphics and Human Computer Interfaces |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Computational Video Aesthetic Multimodal Analysis Neural Network Design Science Databases and Information Systems Graphics and Human Computer Interfaces |
spellingShingle |
Computational Video Aesthetic Multimodal Analysis Neural Network Design Science Databases and Information Systems Graphics and Human Computer Interfaces KANG, Zhangguang NAH, Fiona Fui-hoon SIAU, Keng A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
description |
Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features. |
format |
text |
author |
KANG, Zhangguang NAH, Fiona Fui-hoon SIAU, Keng |
author_facet |
KANG, Zhangguang NAH, Fiona Fui-hoon SIAU, Keng |
author_sort |
KANG, Zhangguang |
title |
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
title_short |
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
title_full |
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
title_fullStr |
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
title_full_unstemmed |
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
title_sort |
computational aesthetic design science study on online video based on triple-dimensional multimodal analysis |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/9962 https://ink.library.smu.edu.sg/context/sis_research/article/10962/viewcontent/ComputationalAestheticDesign_av.pdf |
_version_ |
1821833220551868416 |