Vision-language-model-based video quality assessment
This work introduces a comprehensive approach to video quality assessment (VQA) by both traditional deep-learning-based methods as well as vision-language-model-based methods. Through the development of the DIVIDE-3k database and the DOVER model, we offer nuanced insights into the multifaceted natur...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175035 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This work introduces a comprehensive approach to video quality assessment (VQA) by both traditional deep-learning-based methods as well as vision-language-model-based methods. Through the development of the DIVIDE-3k database and the DOVER model, we offer nuanced insights into the multifaceted nature of video quality, capturing both technical and aesthetic dimensions. Further advancements are achieved with the Maxwell database, designed to pinpoint specific quality factors affecting video perception, and the MaxVQA model, which leverages language-prompted mechanisms for a refined analysis of video quality across various dimensions. The findings underscore the complexity of VQA, revealing the significance of both content-based and technical factors in determining video quality. This work not only advances the state-of-the-art in VQA but also sets the stage for future research in evaluating and enhancing the quality of in-the-wild videos. |
---|