AI-empowered promotional video generation

Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Chang
Other Authors: Yu Han
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161247
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-161247
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Liu, Chang
AI-empowered promotional video generation
description Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video-based promotion campaigns. To address this issue, in this thesis we propose AI-empowered persuasive video generation (AIPVG) that automatically generates promotional videos based on visual materials (i.e. images and video clips) provided by sellers. The goal of AIPVG is to generate videos that are persuasive and have a good viewing experience. AIPVG can be divided into three steps: 1) visual material understanding; 2) visual storyline generation; and 3) post-production. In this thesis, We focus on three questions that are crucial to AIPVG. Firstly, to achieve a low-level understanding of visual materials, visual material representation models need to be trained on real-world E-commerce product datasets. Since such datasets are usually large-scale and contain a large number of noisy labels, how can we make the representation model robust to label noise? Secondly, since we want to produce persuasive videos, how can we define and measure persuasiveness? Thirdly, how can we achieve a good viewing experience by optimizing the storylines? To address the first issue, We propose the Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for deep metric learning (DML). PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose three methods to calculate this probability: 1) Average Similarity Method (AvgSim), which calculates the average similarity between potentially noisy data and clean data; 2) Proxy Similarity Method (ProxySim), which replaces the centers maintained by AvgSim with the proxies trained by proxy-based method; and 3) von Mises-Fisher Distribution Similarity (vMF-Sim), which estimates a von Mises-Fisher distribution for each data class. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy datasets show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time. For the second research question, We propose WundtBackpack. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach. To provide viewers with a good viewing experience, We propose the Shot Composition, Selection and Plotting (ShotCSP) approach. Designed for generating promotional videos in e-commerce settings, ShotCSP considers three key film-making principles into the visual storyline generation pipeline: a) proximity-aware scene transition, b) sound logic flow, and c) graphic discontinuity. We propose two novel metrics to enhance viewing experience: 1) Semantic Distance, which measures how related a shot is to the product being promoted; and 2) Salient Region Ratio, which estimates attention to product details in a shot. Through large-scale user evaluation involving 1,748 pairwise comparisons against five state-of-the-art approaches, ShotCSP achieves a significantly improved viewing experience. It is a promising approach to enable AI-generated promotional videos to benefit e-commerce businesses. These approaches provide an innovative way to incorporate domain best practices from film production and domain knowledge from persuasion theory into AIPVG, thereby moving us closer towards AI-empowered visual persuasion.
author2 Yu Han
author_facet Yu Han
Liu, Chang
format Thesis-Doctor of Philosophy
author Liu, Chang
author_sort Liu, Chang
title AI-empowered promotional video generation
title_short AI-empowered promotional video generation
title_full AI-empowered promotional video generation
title_fullStr AI-empowered promotional video generation
title_full_unstemmed AI-empowered promotional video generation
title_sort ai-empowered promotional video generation
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/161247
_version_ 1744365386357276672
spelling sg-ntu-dr.10356-1612472022-09-01T02:33:19Z AI-empowered promotional video generation Liu, Chang Yu Han School of Computer Science and Engineering han.yu@ntu.edu.sg Engineering::Computer science and engineering Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video-based promotion campaigns. To address this issue, in this thesis we propose AI-empowered persuasive video generation (AIPVG) that automatically generates promotional videos based on visual materials (i.e. images and video clips) provided by sellers. The goal of AIPVG is to generate videos that are persuasive and have a good viewing experience. AIPVG can be divided into three steps: 1) visual material understanding; 2) visual storyline generation; and 3) post-production. In this thesis, We focus on three questions that are crucial to AIPVG. Firstly, to achieve a low-level understanding of visual materials, visual material representation models need to be trained on real-world E-commerce product datasets. Since such datasets are usually large-scale and contain a large number of noisy labels, how can we make the representation model robust to label noise? Secondly, since we want to produce persuasive videos, how can we define and measure persuasiveness? Thirdly, how can we achieve a good viewing experience by optimizing the storylines? To address the first issue, We propose the Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for deep metric learning (DML). PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose three methods to calculate this probability: 1) Average Similarity Method (AvgSim), which calculates the average similarity between potentially noisy data and clean data; 2) Proxy Similarity Method (ProxySim), which replaces the centers maintained by AvgSim with the proxies trained by proxy-based method; and 3) von Mises-Fisher Distribution Similarity (vMF-Sim), which estimates a von Mises-Fisher distribution for each data class. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy datasets show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time. For the second research question, We propose WundtBackpack. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach. To provide viewers with a good viewing experience, We propose the Shot Composition, Selection and Plotting (ShotCSP) approach. Designed for generating promotional videos in e-commerce settings, ShotCSP considers three key film-making principles into the visual storyline generation pipeline: a) proximity-aware scene transition, b) sound logic flow, and c) graphic discontinuity. We propose two novel metrics to enhance viewing experience: 1) Semantic Distance, which measures how related a shot is to the product being promoted; and 2) Salient Region Ratio, which estimates attention to product details in a shot. Through large-scale user evaluation involving 1,748 pairwise comparisons against five state-of-the-art approaches, ShotCSP achieves a significantly improved viewing experience. It is a promising approach to enable AI-generated promotional videos to benefit e-commerce businesses. These approaches provide an innovative way to incorporate domain best practices from film production and domain knowledge from persuasion theory into AIPVG, thereby moving us closer towards AI-empowered visual persuasion. Doctor of Philosophy 2022-08-22T07:30:35Z 2022-08-22T07:30:35Z 2022 Thesis-Doctor of Philosophy Liu, C. (2022). AI-empowered promotional video generation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/161247 https://hdl.handle.net/10356/161247 10.32657/10356/161247 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University