On Profiling Blogs with Representative Entries

With an explosive growth of blogs, information seeking in blogosphere becomes more and more challenging. One example task is to find the most relevant topical blogs against a given query or an existing blog. Such a task requires concise representation of blogs for effective and efficient searching a...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHUANG, Jinfeng, HOI, Steven C. H., SUN, Aixin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2008
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2405
https://ink.library.smu.edu.sg/context/sis_research/article/3405/viewcontent/p55_zhuang.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3405
record_format dspace
spelling sg-smu-ink.sis_research-34052016-09-06T07:50:22Z On Profiling Blogs with Representative Entries ZHUANG, Jinfeng HOI, Steven C. H. SUN, Aixin With an explosive growth of blogs, information seeking in blogosphere becomes more and more challenging. One example task is to find the most relevant topical blogs against a given query or an existing blog. Such a task requires concise representation of blogs for effective and efficient searching and matching. In this paper, we investigate a new problem of profiling a blog by choosing a set of m most representative entries from the blog, where m is a predefined number that is application-dependent. With the set of selected representative entries, applications on blogs avoid handling hundreds or even thousands of entries (or posts) associated with each blog, which are updated frequently and often noisy in nature. To guide the process of selecting the most representative entries, we propose three principles, i.e., anomaly, representativeness, and diversity. Based on these principles, a greedy yet very efficient entry selection algorithm is proposed. To evaluate the entry selection algorithms, an extrinsic evaluation methodology from document summarization research is adapted. Specifically, we evaluate the proposed entry selection algorithms by examining their blog classification accuracies. By evaluating on a number of different classification methods, our empirical results showed that comparable classification accuracy could be achieved by using fewer than 20 representative entries for each blog compared to that of engaging all entries. 2008-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/2405 info:doi/10.1145/1390749.1390759 https://ink.library.smu.edu.sg/context/sis_research/article/3405/viewcontent/p55_zhuang.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Blog profiling Entry selection Blog classification Computer Sciences Social Media
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Blog profiling
Entry selection
Blog classification
Computer Sciences
Social Media
spellingShingle Blog profiling
Entry selection
Blog classification
Computer Sciences
Social Media
ZHUANG, Jinfeng
HOI, Steven C. H.
SUN, Aixin
On Profiling Blogs with Representative Entries
description With an explosive growth of blogs, information seeking in blogosphere becomes more and more challenging. One example task is to find the most relevant topical blogs against a given query or an existing blog. Such a task requires concise representation of blogs for effective and efficient searching and matching. In this paper, we investigate a new problem of profiling a blog by choosing a set of m most representative entries from the blog, where m is a predefined number that is application-dependent. With the set of selected representative entries, applications on blogs avoid handling hundreds or even thousands of entries (or posts) associated with each blog, which are updated frequently and often noisy in nature. To guide the process of selecting the most representative entries, we propose three principles, i.e., anomaly, representativeness, and diversity. Based on these principles, a greedy yet very efficient entry selection algorithm is proposed. To evaluate the entry selection algorithms, an extrinsic evaluation methodology from document summarization research is adapted. Specifically, we evaluate the proposed entry selection algorithms by examining their blog classification accuracies. By evaluating on a number of different classification methods, our empirical results showed that comparable classification accuracy could be achieved by using fewer than 20 representative entries for each blog compared to that of engaging all entries.
format text
author ZHUANG, Jinfeng
HOI, Steven C. H.
SUN, Aixin
author_facet ZHUANG, Jinfeng
HOI, Steven C. H.
SUN, Aixin
author_sort ZHUANG, Jinfeng
title On Profiling Blogs with Representative Entries
title_short On Profiling Blogs with Representative Entries
title_full On Profiling Blogs with Representative Entries
title_fullStr On Profiling Blogs with Representative Entries
title_full_unstemmed On Profiling Blogs with Representative Entries
title_sort on profiling blogs with representative entries
publisher Institutional Knowledge at Singapore Management University
publishDate 2008
url https://ink.library.smu.edu.sg/sis_research/2405
https://ink.library.smu.edu.sg/context/sis_research/article/3405/viewcontent/p55_zhuang.pdf
_version_ 1770572135879671808