Extraction of Coherent Relevant Passages using Hidden Markov Models

In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to e...

Full description

Saved in:
Bibliographic Details
Main Authors: JIANG, Jing, ZHAI, ChengXiang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2006
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/130
https://ink.library.smu.edu.sg/context/sis_research/article/1129/viewcontent/p295_jiang.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1129
record_format dspace
spelling sg-smu-ink.sis_research-11292020-05-12T03:37:50Z Extraction of Coherent Relevant Passages using Hidden Markov Models JIANG, Jing ZHAI, ChengXiang In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage is presumably highly sensitive to both the query and document.In this article, we present a new method for accurately detecting coherent relevant passages of variable lengths using hidden Markov models (HMMs). The HMM-based method naturally captures the topical boundaries between passages relevant and nonrelevant to the query. Pseudo-feedback mechanisms can be naturally incorporated into such an HMM-based framework to improve parameter estimation. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two datasets. We further show how the HMM method can be applied on top of any basic passage extraction method to improve passage boundaries. 2006-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/130 info:doi/10.1145/1165774.1165775 https://ink.library.smu.edu.sg/context/sis_research/article/1129/viewcontent/p295_jiang.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Algorithms Hidden Markov models passage retrieval Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Algorithms
Hidden Markov models
passage retrieval
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Algorithms
Hidden Markov models
passage retrieval
Databases and Information Systems
Numerical Analysis and Scientific Computing
JIANG, Jing
ZHAI, ChengXiang
Extraction of Coherent Relevant Passages using Hidden Markov Models
description In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage is presumably highly sensitive to both the query and document.In this article, we present a new method for accurately detecting coherent relevant passages of variable lengths using hidden Markov models (HMMs). The HMM-based method naturally captures the topical boundaries between passages relevant and nonrelevant to the query. Pseudo-feedback mechanisms can be naturally incorporated into such an HMM-based framework to improve parameter estimation. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two datasets. We further show how the HMM method can be applied on top of any basic passage extraction method to improve passage boundaries.
format text
author JIANG, Jing
ZHAI, ChengXiang
author_facet JIANG, Jing
ZHAI, ChengXiang
author_sort JIANG, Jing
title Extraction of Coherent Relevant Passages using Hidden Markov Models
title_short Extraction of Coherent Relevant Passages using Hidden Markov Models
title_full Extraction of Coherent Relevant Passages using Hidden Markov Models
title_fullStr Extraction of Coherent Relevant Passages using Hidden Markov Models
title_full_unstemmed Extraction of Coherent Relevant Passages using Hidden Markov Models
title_sort extraction of coherent relevant passages using hidden markov models
publisher Institutional Knowledge at Singapore Management University
publishDate 2006
url https://ink.library.smu.edu.sg/sis_research/130
https://ink.library.smu.edu.sg/context/sis_research/article/1129/viewcontent/p295_jiang.pdf
_version_ 1770568880675094528