Incorporating window-based passage-level evidence in document retrieval

This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a ce...

Full description

Saved in:
Bibliographic Details
Main Authors: XI, Wensi, XU-RONG, Richard, KHOO, Christopher Soo Guan, LIM, Ee Peng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2001
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/137
https://ink.library.smu.edu.sg/context/sis_research/article/1136/viewcontent/93502143ba3c89e76ac250f709d94a2a72be.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1136
record_format dspace
spelling sg-smu-ink.sis_research-11362018-06-21T08:51:51Z Incorporating window-based passage-level evidence in document retrieval XI, Wensi XU-RONG, Richard KHOO, Christopher Soo Guan LIM, Ee Peng This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passages extracted and the highest score obtained by a passage of that size was taken as the document’s passage-level score for that window size. A range of window sizes was tried. The experimental results indicated that using a fixed window size of 50 words gave better results than other window sizes for the TREC-5 and TREC-6 test collections. This window size yielded a significant retrieval improvement of 24% compared to using the whole-document retrieval score (using the traditional tf*idf weighting scheme with cosine normalisation). However, combining this window score and the whole-document retrieval score did not yield a retrieval improvement. Using a variable window size (ranging from 50 to 400 words) yielded a retrieval improvement of about 5% over using a fixed window size of 50. Different window sizes were found to work best for different queries. If the best window size to use for each query could be predicted accurately, a maximum retrieval improvement of 42% could be obtained. Subsequent work suggests that the usefulness of passage-level evidence in document retrieval depends on the weighting scheme and type of normalisation used in the retrieval method. 2001-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/137 info:doi/10.1177/016555150102700202 https://ink.library.smu.edu.sg/context/sis_research/article/1136/viewcontent/93502143ba3c89e76ac250f709d94a2a72be.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
XI, Wensi
XU-RONG, Richard
KHOO, Christopher Soo Guan
LIM, Ee Peng
Incorporating window-based passage-level evidence in document retrieval
description This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passages extracted and the highest score obtained by a passage of that size was taken as the document’s passage-level score for that window size. A range of window sizes was tried. The experimental results indicated that using a fixed window size of 50 words gave better results than other window sizes for the TREC-5 and TREC-6 test collections. This window size yielded a significant retrieval improvement of 24% compared to using the whole-document retrieval score (using the traditional tf*idf weighting scheme with cosine normalisation). However, combining this window score and the whole-document retrieval score did not yield a retrieval improvement. Using a variable window size (ranging from 50 to 400 words) yielded a retrieval improvement of about 5% over using a fixed window size of 50. Different window sizes were found to work best for different queries. If the best window size to use for each query could be predicted accurately, a maximum retrieval improvement of 42% could be obtained. Subsequent work suggests that the usefulness of passage-level evidence in document retrieval depends on the weighting scheme and type of normalisation used in the retrieval method.
format text
author XI, Wensi
XU-RONG, Richard
KHOO, Christopher Soo Guan
LIM, Ee Peng
author_facet XI, Wensi
XU-RONG, Richard
KHOO, Christopher Soo Guan
LIM, Ee Peng
author_sort XI, Wensi
title Incorporating window-based passage-level evidence in document retrieval
title_short Incorporating window-based passage-level evidence in document retrieval
title_full Incorporating window-based passage-level evidence in document retrieval
title_fullStr Incorporating window-based passage-level evidence in document retrieval
title_full_unstemmed Incorporating window-based passage-level evidence in document retrieval
title_sort incorporating window-based passage-level evidence in document retrieval
publisher Institutional Knowledge at Singapore Management University
publishDate 2001
url https://ink.library.smu.edu.sg/sis_research/137
https://ink.library.smu.edu.sg/context/sis_research/article/1136/viewcontent/93502143ba3c89e76ac250f709d94a2a72be.pdf
_version_ 1770568883008176128