Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment

Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enh...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Huaizheng
Other Authors: Lau Chiew Tong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/160371
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-160371
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Zhang, Huaizheng
Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
description Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure. Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS). To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration. This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost. We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications.
author2 Lau Chiew Tong
author_facet Lau Chiew Tong
Zhang, Huaizheng
format Thesis-Doctor of Philosophy
author Zhang, Huaizheng
author_sort Zhang, Huaizheng
title Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_short Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_full Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_fullStr Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_full_unstemmed Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
title_sort multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/160371
_version_ 1743119548850110464
spelling sg-ntu-dr.10356-1603712022-08-01T05:07:18Z Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment Zhang, Huaizheng Lau Chiew Tong School of Computer Science and Engineering ASCTLAU@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure. Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS). To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration. This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost. We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications. Doctor of Philosophy 2022-07-20T06:43:46Z 2022-07-20T06:43:46Z 2022 Thesis-Doctor of Philosophy Zhang, H. (2022). Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160371 https://hdl.handle.net/10356/160371 10.32657/10356/160371 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University