Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment

Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enh...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhang, Huaizheng
Other Authors:	Lau Chiew Tong
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/160371
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-160371
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Zhang, Huaizheng Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
description	Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure. Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS). To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration. This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost. We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications.
author2	Lau Chiew Tong
author_facet	Lau Chiew Tong Zhang, Huaizheng
format	Thesis-Doctor of Philosophy
author	Zhang, Huaizheng
author_sort	Zhang, Huaizheng
title	Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_short	Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_full	Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_fullStr	Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_full_unstemmed	Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_sort	multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/160371
_version_	1743119548850110464
spelling	sg-ntu-dr.10356-1603712022-08-01T05:07:18Z Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment Zhang, Huaizheng Lau Chiew Tong School of Computer Science and Engineering ASCTLAU@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure. Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS). To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration. This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost. We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications. Doctor of Philosophy 2022-07-20T06:43:46Z 2022-07-20T06:43:46Z 2022 Thesis-Doctor of Philosophy Zhang, H. (2022). Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160371 https://hdl.handle.net/10356/160371 10.32657/10356/160371 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment

Similar Items