Mining user-created content for document summarization and event detection

Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intell...

Full description

Saved in:
Bibliographic Details
Main Author: Hu, Meishan.
Other Authors: Sun Aixin
Format: Theses and Dissertations
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/44560
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-44560
record_format dspace
spelling sg-ntu-dr.10356-445602023-03-04T00:34:18Z Mining user-created content for document summarization and event detection Hu, Meishan. Sun Aixin School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intelligence and wisdom-of-the-crowd, we conducted research in knowledge mining from user-created content. Our research focused on three forms of user-created content, including comments, blogs, and search queries. Being one of the important features in blogs, comments written by readers are believed to represent readers’ feedback about documents. From our user study conducted on blog reading, we found that human summarizers selected significantly different sets of sentences from the blog posts before and after reading comments. Hence, we proposed and studied the problem of comments-oriented document summarization, whose goal is to extract a subset of sentences from a given document that best reflects the topics not only presented in the document but also discussed among the associated comments. To generate comments-oriented summary, we proposed and evaluated a number of methods under two separate approaches. In feature-scoring approach, we view words as the features that bridge the semantics in document and the associated comments and scored sentences according to their contained words. As the important containers of words, the set of comments was scored through either graph-based or tensor-based scoring method based on three relations (i.e., topic, quotation, and mention) identified among comments. In language-modeling approach, we view the desire of a summary as an information need, and estimate a language model of comments-oriented summary from the document language model and comments language model. Sentences are then ranked through either Odds Ratio selection or Negative Kullback-Leibler Divergence selection. Doctor of Philosophy 2011-06-02T06:22:52Z 2011-06-02T06:22:52Z 2011 2011 Thesis http://hdl.handle.net/10356/44560 en 164 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems
Hu, Meishan.
Mining user-created content for document summarization and event detection
description Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intelligence and wisdom-of-the-crowd, we conducted research in knowledge mining from user-created content. Our research focused on three forms of user-created content, including comments, blogs, and search queries. Being one of the important features in blogs, comments written by readers are believed to represent readers’ feedback about documents. From our user study conducted on blog reading, we found that human summarizers selected significantly different sets of sentences from the blog posts before and after reading comments. Hence, we proposed and studied the problem of comments-oriented document summarization, whose goal is to extract a subset of sentences from a given document that best reflects the topics not only presented in the document but also discussed among the associated comments. To generate comments-oriented summary, we proposed and evaluated a number of methods under two separate approaches. In feature-scoring approach, we view words as the features that bridge the semantics in document and the associated comments and scored sentences according to their contained words. As the important containers of words, the set of comments was scored through either graph-based or tensor-based scoring method based on three relations (i.e., topic, quotation, and mention) identified among comments. In language-modeling approach, we view the desire of a summary as an information need, and estimate a language model of comments-oriented summary from the document language model and comments language model. Sentences are then ranked through either Odds Ratio selection or Negative Kullback-Leibler Divergence selection.
author2 Sun Aixin
author_facet Sun Aixin
Hu, Meishan.
format Theses and Dissertations
author Hu, Meishan.
author_sort Hu, Meishan.
title Mining user-created content for document summarization and event detection
title_short Mining user-created content for document summarization and event detection
title_full Mining user-created content for document summarization and event detection
title_fullStr Mining user-created content for document summarization and event detection
title_full_unstemmed Mining user-created content for document summarization and event detection
title_sort mining user-created content for document summarization and event detection
publishDate 2011
url http://hdl.handle.net/10356/44560
_version_ 1759853880924962816