Collaborative online learning of user generated content

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scala...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Guangxia, CHANG, Kuiyu, HOI, Steven C. H., LIU, Wenting, JAIN, Ramesh
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2011
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2349
https://ink.library.smu.edu.sg/context/sis_research/article/3349/viewcontent/p285_li.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3349
record_format dspace
spelling sg-smu-ink.sis_research-33492020-04-01T02:02:09Z Collaborative online learning of user generated content LI, Guangxia CHANG, Kuiyu HOI, Steven C. H. LIU, Wenting JAIN, Ramesh We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last but not least, when learning a classification model from a group of users, there is a dilemma: A single classification model trained on the entire corpus may fail to capture personalized characteristics such as language and writing styles unique to each user. On the other hand, a personalized model dedicated to each user may be inaccurate due to the scarcity of training data, especially at the very beginning; when users have written just a few articles. To overcome these challenges, we propose learning a global model over all users' data, which is then leveraged to continuously refine the individual models through a collaborative online learning approach. The class imbalance problem is addressed via a cost-sensitive learning approach. Experimental results show that our method is effective and scalable for timely classification of user generated content. 2011-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2349 info:doi/10.1145/2063576.2063622 https://ink.library.smu.edu.sg/context/sis_research/article/3349/viewcontent/p285_li.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University online learning classification imbalanced class distribution Computer Sciences Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic online learning
classification
imbalanced class distribution
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle online learning
classification
imbalanced class distribution
Computer Sciences
Databases and Information Systems
Numerical Analysis and Scientific Computing
LI, Guangxia
CHANG, Kuiyu
HOI, Steven C. H.
LIU, Wenting
JAIN, Ramesh
Collaborative online learning of user generated content
description We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last but not least, when learning a classification model from a group of users, there is a dilemma: A single classification model trained on the entire corpus may fail to capture personalized characteristics such as language and writing styles unique to each user. On the other hand, a personalized model dedicated to each user may be inaccurate due to the scarcity of training data, especially at the very beginning; when users have written just a few articles. To overcome these challenges, we propose learning a global model over all users' data, which is then leveraged to continuously refine the individual models through a collaborative online learning approach. The class imbalance problem is addressed via a cost-sensitive learning approach. Experimental results show that our method is effective and scalable for timely classification of user generated content.
format text
author LI, Guangxia
CHANG, Kuiyu
HOI, Steven C. H.
LIU, Wenting
JAIN, Ramesh
author_facet LI, Guangxia
CHANG, Kuiyu
HOI, Steven C. H.
LIU, Wenting
JAIN, Ramesh
author_sort LI, Guangxia
title Collaborative online learning of user generated content
title_short Collaborative online learning of user generated content
title_full Collaborative online learning of user generated content
title_fullStr Collaborative online learning of user generated content
title_full_unstemmed Collaborative online learning of user generated content
title_sort collaborative online learning of user generated content
publisher Institutional Knowledge at Singapore Management University
publishDate 2011
url https://ink.library.smu.edu.sg/sis_research/2349
https://ink.library.smu.edu.sg/context/sis_research/article/3349/viewcontent/p285_li.pdf
_version_ 1770572106570924032