Learning to rank only using training data from related domain

Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Wei, CAI, Peng, WONG, Kam-Fai, ZHOU, Aoying
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2010
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/4597 https://ink.library.smu.edu.sg/context/sis_research/article/5600/viewcontent/p162_gao.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5600
record_format	dspace
spelling	sg-smu-ink.sis_research-56002019-12-26T07:46:00Z Learning to rank only using training data from related domain GAO, Wei CAI, Peng WONG, Kam-Fai ZHOU, Aoying Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain. 2010-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4597 info:doi/10.1145/1835449.1835478 https://ink.library.smu.edu.sg/context/sis_research/article/5600/viewcontent/p162_gao.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems GAO, Wei CAI, Peng WONG, Kam-Fai ZHOU, Aoying Learning to rank only using training data from related domain
description	Like traditional supervised and semi-supervised algorithms, learning to rank for information retrieval requires document annotations provided by domain experts. It is costly to annotate training data for different search domains and tasks. We propose to exploit training data annotated for a related domain to learn to rank retrieved documents in the target domain, in which no labeled data is available. We present a simple yet effective approach based on instance-weighting scheme. Our method first estimates the importance of each related-domain document relative to the target domain. Then heuristics are studied to transform the importance of individual documents to the pairwise weights of document pairs, which can be directly incorporated into the popular ranking algorithms. Due to importance weighting, ranking model trained on related domain is highly adaptable to the data of target domain. Ranking adaptation experiments on LETOR3.0 dataset [27] demonstrate that with a fair amount of related-domain training data, our method significantly outperforms the baseline without weighting, and most of time is not significantly worse than an "ideal" model directly trained on target domain.
format	text
author	GAO, Wei CAI, Peng WONG, Kam-Fai ZHOU, Aoying
author_facet	GAO, Wei CAI, Peng WONG, Kam-Fai ZHOU, Aoying
author_sort	GAO, Wei
title	Learning to rank only using training data from related domain
title_short	Learning to rank only using training data from related domain
title_full	Learning to rank only using training data from related domain
title_fullStr	Learning to rank only using training data from related domain
title_full_unstemmed	Learning to rank only using training data from related domain
title_sort	learning to rank only using training data from related domain
publisher	Institutional Knowledge at Singapore Management University
publishDate	2010
url	https://ink.library.smu.edu.sg/sis_research/4597 https://ink.library.smu.edu.sg/context/sis_research/article/5600/viewcontent/p162_gao.pdf
_version_	1770574925736706048

Learning to rank only using training data from related domain

Similar Items