Audio captioning and retrieval with improved cross-modal objectives

Automated Audio Captioning (AAC) is the task of generating descriptive captions from an input audio clip, while Language-Based Audio Retrieval (LBAR) is the task of retrieving the most relevant audio clip based on an input text query. AAC requires a model that is not only able to comprehend the acou...

Full description

Saved in:

Bibliographic Details
Main Author:	Koh, Andrew Jin Jie
Other Authors:	Chng Eng Siong
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/172437
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Be the first to leave a comment!

Audio captioning and retrieval with improved cross-modal objectives

Similar Items