AI-empowered promotional video generation

Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Liu, Chang
مؤلفون آخرون:	Yu Han
التنسيق:	Thesis-Doctor of Philosophy
اللغة:	English
منشور في:	Nanyang Technological University 2022
الموضوعات:	Engineering::Computer science and engineering
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/161247
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-161247
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Liu, Chang AI-empowered promotional video generation
description	Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video-based promotion campaigns. To address this issue, in this thesis we propose AI-empowered persuasive video generation (AIPVG) that automatically generates promotional videos based on visual materials (i.e. images and video clips) provided by sellers. The goal of AIPVG is to generate videos that are persuasive and have a good viewing experience. AIPVG can be divided into three steps: 1) visual material understanding; 2) visual storyline generation; and 3) post-production. In this thesis, We focus on three questions that are crucial to AIPVG. Firstly, to achieve a low-level understanding of visual materials, visual material representation models need to be trained on real-world E-commerce product datasets. Since such datasets are usually large-scale and contain a large number of noisy labels, how can we make the representation model robust to label noise? Secondly, since we want to produce persuasive videos, how can we define and measure persuasiveness? Thirdly, how can we achieve a good viewing experience by optimizing the storylines? To address the first issue, We propose the Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for deep metric learning (DML). PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose three methods to calculate this probability: 1) Average Similarity Method (AvgSim), which calculates the average similarity between potentially noisy data and clean data; 2) Proxy Similarity Method (ProxySim), which replaces the centers maintained by AvgSim with the proxies trained by proxy-based method; and 3) von Mises-Fisher Distribution Similarity (vMF-Sim), which estimates a von Mises-Fisher distribution for each data class. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy datasets show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time. For the second research question, We propose WundtBackpack. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach. To provide viewers with a good viewing experience, We propose the Shot Composition, Selection and Plotting (ShotCSP) approach. Designed for generating promotional videos in e-commerce settings, ShotCSP considers three key film-making principles into the visual storyline generation pipeline: a) proximity-aware scene transition, b) sound logic flow, and c) graphic discontinuity. We propose two novel metrics to enhance viewing experience: 1) Semantic Distance, which measures how related a shot is to the product being promoted; and 2) Salient Region Ratio, which estimates attention to product details in a shot. Through large-scale user evaluation involving 1,748 pairwise comparisons against five state-of-the-art approaches, ShotCSP achieves a significantly improved viewing experience. It is a promising approach to enable AI-generated promotional videos to benefit e-commerce businesses. These approaches provide an innovative way to incorporate domain best practices from film production and domain knowledge from persuasion theory into AIPVG, thereby moving us closer towards AI-empowered visual persuasion.
author2	Yu Han
author_facet	Yu Han Liu, Chang
format	Thesis-Doctor of Philosophy
author	Liu, Chang
author_sort	Liu, Chang
title	AI-empowered promotional video generation
title_short	AI-empowered promotional video generation
title_full	AI-empowered promotional video generation
title_fullStr	AI-empowered promotional video generation
title_full_unstemmed	AI-empowered promotional video generation
title_sort	ai-empowered promotional video generation
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/161247
_version_	1744365386357276672
spelling	sg-ntu-dr.10356-1612472022-09-01T02:33:19Z AI-empowered promotional video generation Liu, Chang Yu Han School of Computer Science and Engineering han.yu@ntu.edu.sg Engineering::Computer science and engineering Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video-based promotion campaigns. To address this issue, in this thesis we propose AI-empowered persuasive video generation (AIPVG) that automatically generates promotional videos based on visual materials (i.e. images and video clips) provided by sellers. The goal of AIPVG is to generate videos that are persuasive and have a good viewing experience. AIPVG can be divided into three steps: 1) visual material understanding; 2) visual storyline generation; and 3) post-production. In this thesis, We focus on three questions that are crucial to AIPVG. Firstly, to achieve a low-level understanding of visual materials, visual material representation models need to be trained on real-world E-commerce product datasets. Since such datasets are usually large-scale and contain a large number of noisy labels, how can we make the representation model robust to label noise? Secondly, since we want to produce persuasive videos, how can we define and measure persuasiveness? Thirdly, how can we achieve a good viewing experience by optimizing the storylines? To address the first issue, We propose the Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for deep metric learning (DML). PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose three methods to calculate this probability: 1) Average Similarity Method (AvgSim), which calculates the average similarity between potentially noisy data and clean data; 2) Proxy Similarity Method (ProxySim), which replaces the centers maintained by AvgSim with the proxies trained by proxy-based method; and 3) von Mises-Fisher Distribution Similarity (vMF-Sim), which estimates a von Mises-Fisher distribution for each data class. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy datasets show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time. For the second research question, We propose WundtBackpack. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach. To provide viewers with a good viewing experience, We propose the Shot Composition, Selection and Plotting (ShotCSP) approach. Designed for generating promotional videos in e-commerce settings, ShotCSP considers three key film-making principles into the visual storyline generation pipeline: a) proximity-aware scene transition, b) sound logic flow, and c) graphic discontinuity. We propose two novel metrics to enhance viewing experience: 1) Semantic Distance, which measures how related a shot is to the product being promoted; and 2) Salient Region Ratio, which estimates attention to product details in a shot. Through large-scale user evaluation involving 1,748 pairwise comparisons against five state-of-the-art approaches, ShotCSP achieves a significantly improved viewing experience. It is a promising approach to enable AI-generated promotional videos to benefit e-commerce businesses. These approaches provide an innovative way to incorporate domain best practices from film production and domain knowledge from persuasion theory into AIPVG, thereby moving us closer towards AI-empowered visual persuasion. Doctor of Philosophy 2022-08-22T07:30:35Z 2022-08-22T07:30:35Z 2022 Thesis-Doctor of Philosophy Liu, C. (2022). AI-empowered promotional video generation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/161247 https://hdl.handle.net/10356/161247 10.32657/10356/161247 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

AI-empowered promotional video generation

مواد مشابهة