Robust spoken term detection using partial search and re-scoring hypothesized detections techniques

This research focusses on Spoken Term Detection (STD) which aims to detect a textual keyword in a speech corpus. A typical STD system relies on an Automatic Speech Recognition (ASR) system to transform the speech corpus to intermediate textual representations such as 1-best transcriptions or lattice...

Full description

Saved in:
Bibliographic Details
Main Author: Pham, Van Tung
Other Authors: Chng Eng Siong
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/82987
http://hdl.handle.net/10220/47558
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-82987
record_format dspace
spelling sg-ntu-dr.10356-829872020-06-24T05:55:14Z Robust spoken term detection using partial search and re-scoring hypothesized detections techniques Pham, Van Tung Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering This research focusses on Spoken Term Detection (STD) which aims to detect a textual keyword in a speech corpus. A typical STD system relies on an Automatic Speech Recognition (ASR) system to transform the speech corpus to intermediate textual representations such as 1-best transcriptions or lattices of word and subword for indexing/retrieval. However, the imperfect modelling of ASR results in two types of error, i.e. missing and false alarm. This thesis aims to address both type of errors. In STD, the subword approach has been attractive because it is able to address the Out-of-Vocabulary (OOV) problems. A standard subword-based STD system, referred to as the full search technique, first converts the keyword into a subword sequence, then searches the subword sequence in the subword lattices. However, due to the high error rate of subword ASR, detecting entire subword sequence in lattices is difficult and results in a high miss rate. This thesis proposes a partial search approach to address this problem. The proposed approach transforms the keyword’s subword sequence into overlapping sub-sequences and then searches these sub-sequences in the index. It reduces the miss rate by accepting hypothesized detections that contain some of the keyword’s sub-sequences. STD systems rank and make “accept/reject” decisions on hypothesized detections by the confidence scores estimated from the decoding lattices generated by the ASR. Such scores may be inaccurate due to the imperfect modelling of speech and noise. Using the lattice-based posterior probabilities as the detection scores might result in degraded STD performance. Firstly, it is observed that the posterior probabilities are not comparable across keywords. As a result, it is difficult to make “accept/reject” decisions using a single threshold for all keywords. Secondly, a correct detection might have a smaller posterior probability than false alarm detections. Two techniques to re-score and re-rank hypothesized detections are proposed. These techniques utilize additional information that is not captured by the detection scores, hence improve the STD performance. The first technique re-scores hypothesized detections using keyword exemplars. A keyword exemplar is a true instance of the keyword obtained from a labelled speech corpus. The main idea is that if a hypothesized detection is acoustically more similar to the keyword exemplars, it is more likely to be a true detection and hence its score should be boosted. Experimental results show that the proposed technique consistently outperforms previous re-ranking methods that do not make use of keyword exemplars. The second technique re-scores hypothesized detections by exploiting features derived from competing hypotheses. Competing hypotheses of a detection are its alternative hypotheses which have similar time information as the detection in the corresponding lattice. From the competing hypotheses, several novel features are derived. These features reflect the relative confidence of the hypothesized detection to its competing hypotheses. These features are informative and can be used to re-score detections. Experimental results show that using these features result in improved STD performance. Doctor of Philosophy 2019-01-25T03:20:55Z 2019-12-06T15:09:42Z 2019-01-25T03:20:55Z 2019-12-06T15:09:42Z 2019 Thesis Pham, V. T. (2019). Robust spoken term detection using partial search and re-scoring hypothesized detections techniques. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/82987 http://hdl.handle.net/10220/47558 10.32657/10220/47558 en 146 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Pham, Van Tung
Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
description This research focusses on Spoken Term Detection (STD) which aims to detect a textual keyword in a speech corpus. A typical STD system relies on an Automatic Speech Recognition (ASR) system to transform the speech corpus to intermediate textual representations such as 1-best transcriptions or lattices of word and subword for indexing/retrieval. However, the imperfect modelling of ASR results in two types of error, i.e. missing and false alarm. This thesis aims to address both type of errors. In STD, the subword approach has been attractive because it is able to address the Out-of-Vocabulary (OOV) problems. A standard subword-based STD system, referred to as the full search technique, first converts the keyword into a subword sequence, then searches the subword sequence in the subword lattices. However, due to the high error rate of subword ASR, detecting entire subword sequence in lattices is difficult and results in a high miss rate. This thesis proposes a partial search approach to address this problem. The proposed approach transforms the keyword’s subword sequence into overlapping sub-sequences and then searches these sub-sequences in the index. It reduces the miss rate by accepting hypothesized detections that contain some of the keyword’s sub-sequences. STD systems rank and make “accept/reject” decisions on hypothesized detections by the confidence scores estimated from the decoding lattices generated by the ASR. Such scores may be inaccurate due to the imperfect modelling of speech and noise. Using the lattice-based posterior probabilities as the detection scores might result in degraded STD performance. Firstly, it is observed that the posterior probabilities are not comparable across keywords. As a result, it is difficult to make “accept/reject” decisions using a single threshold for all keywords. Secondly, a correct detection might have a smaller posterior probability than false alarm detections. Two techniques to re-score and re-rank hypothesized detections are proposed. These techniques utilize additional information that is not captured by the detection scores, hence improve the STD performance. The first technique re-scores hypothesized detections using keyword exemplars. A keyword exemplar is a true instance of the keyword obtained from a labelled speech corpus. The main idea is that if a hypothesized detection is acoustically more similar to the keyword exemplars, it is more likely to be a true detection and hence its score should be boosted. Experimental results show that the proposed technique consistently outperforms previous re-ranking methods that do not make use of keyword exemplars. The second technique re-scores hypothesized detections by exploiting features derived from competing hypotheses. Competing hypotheses of a detection are its alternative hypotheses which have similar time information as the detection in the corresponding lattice. From the competing hypotheses, several novel features are derived. These features reflect the relative confidence of the hypothesized detection to its competing hypotheses. These features are informative and can be used to re-score detections. Experimental results show that using these features result in improved STD performance.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Pham, Van Tung
format Theses and Dissertations
author Pham, Van Tung
author_sort Pham, Van Tung
title Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_short Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_full Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_fullStr Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_full_unstemmed Robust spoken term detection using partial search and re-scoring hypothesized detections techniques
title_sort robust spoken term detection using partial search and re-scoring hypothesized detections techniques
publishDate 2019
url https://hdl.handle.net/10356/82987
http://hdl.handle.net/10220/47558
_version_ 1681056862969004032