MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos

In the digital age, the proliferation of user-generated content (UGC) videos presents unique challenges in maintaining video quality across diverse platforms. In this project, we propose Masked Auto-Encoder model for no-reference video quality assessment (NR-VQA) problem. To our best knowledge, we a...

全面介紹

Saved in:

書目詳細資料
主要作者:	Wang, Chuhan
其他作者:	Lin Weisi
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2024
主題:	Computer and Information Science
在線閱讀:	https://hdl.handle.net/10356/178566
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-178566
record_format	dspace
spelling	sg-ntu-dr.10356-1785662024-06-28T15:36:47Z MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos Wang, Chuhan Lin Weisi School of Computer Science and Engineering WSLin@ntu.edu.sg Computer and Information Science In the digital age, the proliferation of user-generated content (UGC) videos presents unique challenges in maintaining video quality across diverse platforms. In this project, we propose Masked Auto-Encoder model for no-reference video quality assessment (NR-VQA) problem. To our best knowledge, we are the first to apply the MAE to NR- VQA, and propose the MAE-VQA model. Specifically, MAE-VQA model is designed to evaluate the quality of UGC videos without the need for reference footage, which is often unavailable in real-world scenarios. It is composed of three modules: patch masking module, auto-encoder module, and quality regression module, respectively for handling sampling strategy, capturing spatiotemporal representations, and mapping to video quality score. This approach is specifically designed to capture and analyze the complex spatiotemporal features and diverse distortions typical of UGC. Vision Transformer’s (ViT) self-attention mechanism allows for detailed observation of different parts in a video, facilitating the understanding of their correlation. Transformer is able to extract the features and texture information from the distorted video. Given that video content is highly redundant, appropriately extracted features can speed up the model without decreasing accuracy. By masking the majority of the input video, MAE-VQA can use ViTto learn robust spatiotemporal representations from videos. We conduct thorough assessments on benchmark datasets to contrast our methodology with cutting-edge techniques. The achievement of this project is that our approach achieves state-of-the-art performance across the majority of VQA datasets and secures a close second in the remainder, while resulting in a significant reduction in computational overhead. Bachelor's degree 2024-06-26T05:39:37Z 2024-06-26T05:39:37Z 2024 Final Year Project (FYP) Wang, C. (2024). MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/178566 https://hdl.handle.net/10356/178566 en SCSE23-0760 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Wang, Chuhan MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos
description	In the digital age, the proliferation of user-generated content (UGC) videos presents unique challenges in maintaining video quality across diverse platforms. In this project, we propose Masked Auto-Encoder model for no-reference video quality assessment (NR-VQA) problem. To our best knowledge, we are the first to apply the MAE to NR- VQA, and propose the MAE-VQA model. Specifically, MAE-VQA model is designed to evaluate the quality of UGC videos without the need for reference footage, which is often unavailable in real-world scenarios. It is composed of three modules: patch masking module, auto-encoder module, and quality regression module, respectively for handling sampling strategy, capturing spatiotemporal representations, and mapping to video quality score. This approach is specifically designed to capture and analyze the complex spatiotemporal features and diverse distortions typical of UGC. Vision Transformer’s (ViT) self-attention mechanism allows for detailed observation of different parts in a video, facilitating the understanding of their correlation. Transformer is able to extract the features and texture information from the distorted video. Given that video content is highly redundant, appropriately extracted features can speed up the model without decreasing accuracy. By masking the majority of the input video, MAE-VQA can use ViTto learn robust spatiotemporal representations from videos. We conduct thorough assessments on benchmark datasets to contrast our methodology with cutting-edge techniques. The achievement of this project is that our approach achieves state-of-the-art performance across the majority of VQA datasets and secures a close second in the remainder, while resulting in a significant reduction in computational overhead.
author2	Lin Weisi
author_facet	Lin Weisi Wang, Chuhan
format	Final Year Project
author	Wang, Chuhan
author_sort	Wang, Chuhan
title	MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos
title_short	MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos
title_full	MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos
title_fullStr	MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos
title_full_unstemmed	MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos
title_sort	mae-vqa: an efficient and accurate end-to-end video quality assessment method for user generated content videos
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/178566
_version_	1806059813078040576

MAE-VQA: an efficient and accurate end-to-end video quality assessment method for user generated content videos

相似書籍