Incorporating window-based passage-level evidence in document retrieval
This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a ce...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2001
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/137 https://ink.library.smu.edu.sg/context/sis_research/article/1136/viewcontent/93502143ba3c89e76ac250f709d94a2a72be.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-1136 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-11362018-06-21T08:51:51Z Incorporating window-based passage-level evidence in document retrieval XI, Wensi XU-RONG, Richard KHOO, Christopher Soo Guan LIM, Ee Peng This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passages extracted and the highest score obtained by a passage of that size was taken as the document’s passage-level score for that window size. A range of window sizes was tried. The experimental results indicated that using a fixed window size of 50 words gave better results than other window sizes for the TREC-5 and TREC-6 test collections. This window size yielded a significant retrieval improvement of 24% compared to using the whole-document retrieval score (using the traditional tf*idf weighting scheme with cosine normalisation). However, combining this window score and the whole-document retrieval score did not yield a retrieval improvement. Using a variable window size (ranging from 50 to 400 words) yielded a retrieval improvement of about 5% over using a fixed window size of 50. Different window sizes were found to work best for different queries. If the best window size to use for each query could be predicted accurately, a maximum retrieval improvement of 42% could be obtained. Subsequent work suggests that the usefulness of passage-level evidence in document retrieval depends on the weighting scheme and type of normalisation used in the retrieval method. 2001-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/137 info:doi/10.1177/016555150102700202 https://ink.library.smu.edu.sg/context/sis_research/article/1136/viewcontent/93502143ba3c89e76ac250f709d94a2a72be.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Databases and Information Systems Numerical Analysis and Scientific Computing XI, Wensi XU-RONG, Richard KHOO, Christopher Soo Guan LIM, Ee Peng Incorporating window-based passage-level evidence in document retrieval |
description |
This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passages extracted and the highest score obtained by a passage of that size was taken as the document’s passage-level score for that window size. A range of window sizes was tried. The experimental results indicated that using a fixed window size of 50 words gave better results than other window sizes for the TREC-5 and TREC-6 test collections. This window size yielded a significant retrieval improvement of 24% compared to using the whole-document retrieval score (using the traditional tf*idf weighting scheme with cosine normalisation). However, combining this window score and the whole-document retrieval score did not yield a retrieval improvement. Using a variable window size (ranging from 50 to 400 words) yielded a retrieval improvement of about 5% over using a fixed window size of 50. Different window sizes were found to work best for different queries. If the best window size to use for each query could be predicted accurately, a maximum retrieval improvement of 42% could be obtained. Subsequent work suggests that the usefulness of passage-level evidence in document retrieval depends on the weighting scheme and type of normalisation used in the retrieval method. |
format |
text |
author |
XI, Wensi XU-RONG, Richard KHOO, Christopher Soo Guan LIM, Ee Peng |
author_facet |
XI, Wensi XU-RONG, Richard KHOO, Christopher Soo Guan LIM, Ee Peng |
author_sort |
XI, Wensi |
title |
Incorporating window-based passage-level evidence in document retrieval |
title_short |
Incorporating window-based passage-level evidence in document retrieval |
title_full |
Incorporating window-based passage-level evidence in document retrieval |
title_fullStr |
Incorporating window-based passage-level evidence in document retrieval |
title_full_unstemmed |
Incorporating window-based passage-level evidence in document retrieval |
title_sort |
incorporating window-based passage-level evidence in document retrieval |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2001 |
url |
https://ink.library.smu.edu.sg/sis_research/137 https://ink.library.smu.edu.sg/context/sis_research/article/1136/viewcontent/93502143ba3c89e76ac250f709d94a2a72be.pdf |
_version_ |
1770568883008176128 |