CATNet: Cross-modal fusion for audio-visual speech recognition
Automatic speech recognition (ASR) is a typical pattern recognition technology that converts human speeches into texts. With the aid of advanced deep learning models, the performance of speech recognition is significantly improved. Especially, the emerging Audio–Visual Speech Recognition (AVSR) meth...
Saved in:
Main Authors: | WANG, Xingmei, MI, Jianchen, LI, Boquan, ZHAO, Yixu, MENG, Jiaxiang |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8645 https://ink.library.smu.edu.sg/context/sis_research/article/9648/viewcontent/CatNet_av.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
CROSS-MODALITY COMPLEMENTARITY FOR AUDIO-VISUAL SPEECH RECOGNITION
by: WANG JIADONG
Published: (2024) -
Audio-visual modeling for bimodal speech recognition
by: Kaynak, M.N., et al.
Published: (2014) -
AV-FDTI: audio-visual fusion for drone threat identification
by: Yang, Yizhuo, et al.
Published: (2024) -
AUDIO-VISUAL ACTIVE SPEAKER DETECTION AND RECOGNITION
by: TAO RUIJIE
Published: (2023) -
Attentive Moment Retrieval in Videos
by: Meng Liu, et al.
Published: (2020)