Robust spoken term detection using partial search and re-scoring hypothesized detections techniques

This research focusses on Spoken Term Detection (STD) which aims to detect a textual keyword in a speech corpus. A typical STD system relies on an Automatic Speech Recognition (ASR) system to transform the speech corpus to intermediate textual representations such as 1-best transcriptions or lattice...

Full description

Saved in:

Bibliographic Details
Main Author:	Pham, Van Tung
Other Authors:	Chng Eng Siong
Format:	Theses and Dissertations
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/82987 http://hdl.handle.net/10220/47558
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-82987
record_format	dspace
spelling	sg-ntu-dr.10356-829872020-06-24T05:55:14Z Robust spoken term detection using partial search and re-scoring hypothesized detections techniques Pham, Van Tung Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering This research focusses on Spoken Term Detection (STD) which aims to detect a textual keyword in a speech corpus. A typical STD system relies on an Automatic Speech Recognition (ASR) system to transform the speech corpus to intermediate textual representations such as 1-best transcriptions or lattices of word and subword for indexing/retrieval. However, the imperfect modelling of ASR results in two types of error, i.e. missing and false alarm. This thesis aims to address both type of errors. In STD, the subword approach has been attractive because it is able to address the Out-of-Vocabulary (OOV) problems. A standard subword-based STD system, referred to as the full search technique, first converts the keyword into a subword sequence, then searches the subword sequence in the subword lattices. However, due to the high error rate of subword ASR, detecting entire subword sequence in lattices is difficult and results in a high miss rate. This thesis proposes a partial search approach to address this problem. The proposed approach transforms the keyword’s subword sequence into overlapping sub-sequences and then searches these sub-sequences in the index. It reduces the miss rate by accepting hypothesized detections that contain some of the keyword’s sub-sequences. STD systems rank and make “accept/reject” decisions on hypothesized detections by the confidence scores estimated from the decoding lattices generated by the ASR. Such scores may be inaccurate due to the imperfect modelling of speech and noise. Using the lattice-based posterior probabilities as the detection scores might result in degraded STD performance. Firstly, it is observed that the posterior probabilities are not comparable across keywords. As a result, it is difficult to make “accept/reject” decisions using a single threshold for all keywords. Secondly, a correct detection might have a smaller posterior probability than false alarm detections. Two techniques to re-score and re-rank hypothesized detections are proposed. These techniques utilize additional information that is not captured by the detection scores, hence improve the STD performance. The first technique re-scores hypothesized detections using keyword exemplars. A keyword exemplar is a true instance of the keyword obtained from a labelled speech corpus. The main idea is that if a hypothesized detection is acoustically more similar to the keyword exemplars, it is more likely to be a true detection and hence its score should be boosted. Experimental results show that the proposed technique consistently outperforms previous re-ranking methods that do not make use of keyword exemplars. The second technique re-scores hypothesized detections by exploiting features derived from competing hypotheses. Competing hypotheses of a detection are its alternative hypotheses which have similar time information as the detection in the corresponding lattice. From the competing hypotheses, several novel features are derived. These features reflect the relative confidence of the hypothesized detection to its competing hypotheses. These features are informative and can be used to re-score detections. Experimental results show that using these features result in improved STD performance. Doctor of Philosophy 2019-01-25T03:20:55Z 2019-12-06T15:09:42Z 2019-01-25T03:20:55Z 2019-12-06T15:09:42Z 2019 Thesis Pham, V. T. (2019). Robust spoken term detection using partial search and re-scoring hypothesized detections techniques. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/82987 http://hdl.handle.net/10220/47558 10.32657/10220/47558 en 146 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Pham, Van Tung Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
description	This research focusses on Spoken Term Detection (STD) which aims to detect a textual keyword in a speech corpus. A typical STD system relies on an Automatic Speech Recognition (ASR) system to transform the speech corpus to intermediate textual representations such as 1-best transcriptions or lattices of word and subword for indexing/retrieval. However, the imperfect modelling of ASR results in two types of error, i.e. missing and false alarm. This thesis aims to address both type of errors. In STD, the subword approach has been attractive because it is able to address the Out-of-Vocabulary (OOV) problems. A standard subword-based STD system, referred to as the full search technique, first converts the keyword into a subword sequence, then searches the subword sequence in the subword lattices. However, due to the high error rate of subword ASR, detecting entire subword sequence in lattices is difficult and results in a high miss rate. This thesis proposes a partial search approach to address this problem. The proposed approach transforms the keyword’s subword sequence into overlapping sub-sequences and then searches these sub-sequences in the index. It reduces the miss rate by accepting hypothesized detections that contain some of the keyword’s sub-sequences. STD systems rank and make “accept/reject” decisions on hypothesized detections by the confidence scores estimated from the decoding lattices generated by the ASR. Such scores may be inaccurate due to the imperfect modelling of speech and noise. Using the lattice-based posterior probabilities as the detection scores might result in degraded STD performance. Firstly, it is observed that the posterior probabilities are not comparable across keywords. As a result, it is difficult to make “accept/reject” decisions using a single threshold for all keywords. Secondly, a correct detection might have a smaller posterior probability than false alarm detections. Two techniques to re-score and re-rank hypothesized detections are proposed. These techniques utilize additional information that is not captured by the detection scores, hence improve the STD performance. The first technique re-scores hypothesized detections using keyword exemplars. A keyword exemplar is a true instance of the keyword obtained from a labelled speech corpus. The main idea is that if a hypothesized detection is acoustically more similar to the keyword exemplars, it is more likely to be a true detection and hence its score should be boosted. Experimental results show that the proposed technique consistently outperforms previous re-ranking methods that do not make use of keyword exemplars. The second technique re-scores hypothesized detections by exploiting features derived from competing hypotheses. Competing hypotheses of a detection are its alternative hypotheses which have similar time information as the detection in the corresponding lattice. From the competing hypotheses, several novel features are derived. These features reflect the relative confidence of the hypothesized detection to its competing hypotheses. These features are informative and can be used to re-score detections. Experimental results show that using these features result in improved STD performance.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Pham, Van Tung
format	Theses and Dissertations
author	Pham, Van Tung
author_sort	Pham, Van Tung
title	Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_short	Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_full	Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_fullStr	Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_full_unstemmed	Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_sort	robust spoken term detection using partial search and re-scoring hypothesized detections techniques
publishDate	2019
url	https://hdl.handle.net/10356/82987 http://hdl.handle.net/10220/47558
_version_	1681056862969004032

Robust spoken term detection using partial search and re-scoring hypothesized detections techniques

Similar Items