Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enh...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/160371 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-160371 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Zhang, Huaizheng Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
description |
Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure.
Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS).
To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration.
This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost.
We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications. |
author2 |
Lau Chiew Tong |
author_facet |
Lau Chiew Tong Zhang, Huaizheng |
format |
Thesis-Doctor of Philosophy |
author |
Zhang, Huaizheng |
author_sort |
Zhang, Huaizheng |
title |
Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
title_short |
Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
title_full |
Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
title_fullStr |
Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
title_full_unstemmed |
Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
title_sort |
multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/160371 |
_version_ |
1743119548850110464 |
spelling |
sg-ntu-dr.10356-1603712022-08-01T05:07:18Z Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment Zhang, Huaizheng Lau Chiew Tong School of Computer Science and Engineering ASCTLAU@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure. Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS). To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration. This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost. We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications. Doctor of Philosophy 2022-07-20T06:43:46Z 2022-07-20T06:43:46Z 2022 Thesis-Doctor of Philosophy Zhang, H. (2022). Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160371 https://hdl.handle.net/10356/160371 10.32657/10356/160371 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |