A link-bridged topic model for cross-domain document classification

Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) re...

Full description

Saved in:
Bibliographic Details
Main Authors: YANG, Pei, GAO, Wei, TAN, Qi, WONG, Kam-Fai
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2013
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4550
https://ink.library.smu.edu.sg/context/sis_research/article/5553/viewcontent/1_s2.0_S0306457313000514_main.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5553
record_format dspace
spelling sg-smu-ink.sis_research-55532019-12-26T09:03:41Z A link-bridged topic model for cross-domain document classification YANG, Pei GAO, Wei TAN, Qi WONG, Kam-Fai Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification. 2013-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4550 info:doi/10.1016/j.ipm.2013.05.002 https://ink.library.smu.edu.sg/context/sis_research/article/5553/viewcontent/1_s2.0_S0306457313000514_main.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Cross-domain Document classification Transfer learning Auxiliary link network Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Cross-domain
Document classification
Transfer learning
Auxiliary link network
Databases and Information Systems
spellingShingle Cross-domain
Document classification
Transfer learning
Auxiliary link network
Databases and Information Systems
YANG, Pei
GAO, Wei
TAN, Qi
WONG, Kam-Fai
A link-bridged topic model for cross-domain document classification
description Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.
format text
author YANG, Pei
GAO, Wei
TAN, Qi
WONG, Kam-Fai
author_facet YANG, Pei
GAO, Wei
TAN, Qi
WONG, Kam-Fai
author_sort YANG, Pei
title A link-bridged topic model for cross-domain document classification
title_short A link-bridged topic model for cross-domain document classification
title_full A link-bridged topic model for cross-domain document classification
title_fullStr A link-bridged topic model for cross-domain document classification
title_full_unstemmed A link-bridged topic model for cross-domain document classification
title_sort link-bridged topic model for cross-domain document classification
publisher Institutional Knowledge at Singapore Management University
publishDate 2013
url https://ink.library.smu.edu.sg/sis_research/4550
https://ink.library.smu.edu.sg/context/sis_research/article/5553/viewcontent/1_s2.0_S0306457313000514_main.pdf
_version_ 1770574911565201408