Mining user-created content for document summarization and event detection
Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intell...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/44560 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-44560 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-445602023-03-04T00:34:18Z Mining user-created content for document summarization and event detection Hu, Meishan. Sun Aixin School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intelligence and wisdom-of-the-crowd, we conducted research in knowledge mining from user-created content. Our research focused on three forms of user-created content, including comments, blogs, and search queries. Being one of the important features in blogs, comments written by readers are believed to represent readers’ feedback about documents. From our user study conducted on blog reading, we found that human summarizers selected significantly different sets of sentences from the blog posts before and after reading comments. Hence, we proposed and studied the problem of comments-oriented document summarization, whose goal is to extract a subset of sentences from a given document that best reflects the topics not only presented in the document but also discussed among the associated comments. To generate comments-oriented summary, we proposed and evaluated a number of methods under two separate approaches. In feature-scoring approach, we view words as the features that bridge the semantics in document and the associated comments and scored sentences according to their contained words. As the important containers of words, the set of comments was scored through either graph-based or tensor-based scoring method based on three relations (i.e., topic, quotation, and mention) identified among comments. In language-modeling approach, we view the desire of a summary as an information need, and estimate a language model of comments-oriented summary from the document language model and comments language model. Sentences are then ranked through either Odds Ratio selection or Negative Kullback-Leibler Divergence selection. Doctor of Philosophy 2011-06-02T06:22:52Z 2011-06-02T06:22:52Z 2011 2011 Thesis http://hdl.handle.net/10356/44560 en 164 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems Hu, Meishan. Mining user-created content for document summarization and event detection |
description |
Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intelligence and wisdom-of-the-crowd, we conducted research in knowledge mining from user-created content. Our research focused on three forms of user-created content, including comments, blogs, and search queries.
Being one of the important features in blogs, comments written by readers are believed to represent readers’ feedback about documents. From our user study conducted on blog reading, we found that human summarizers selected significantly different sets of sentences from the blog posts before and after reading comments. Hence, we proposed and studied the problem of comments-oriented document summarization, whose goal is to extract a subset of sentences from a given document that best reflects the topics not only presented in the document but also discussed among the associated comments. To generate comments-oriented summary, we proposed and evaluated a number of methods under two separate approaches. In feature-scoring approach, we view words as the features that bridge the semantics in document and the associated comments and scored sentences according to their contained words. As the important containers of words, the set of comments was scored through either graph-based or tensor-based scoring method based on three relations (i.e., topic, quotation, and mention) identified among comments. In language-modeling approach, we view the desire of a summary as an information need, and estimate a language model of comments-oriented summary from the document language model and comments language model. Sentences are then ranked through either Odds Ratio selection or Negative Kullback-Leibler Divergence selection. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Hu, Meishan. |
format |
Theses and Dissertations |
author |
Hu, Meishan. |
author_sort |
Hu, Meishan. |
title |
Mining user-created content for document summarization and event detection |
title_short |
Mining user-created content for document summarization and event detection |
title_full |
Mining user-created content for document summarization and event detection |
title_fullStr |
Mining user-created content for document summarization and event detection |
title_full_unstemmed |
Mining user-created content for document summarization and event detection |
title_sort |
mining user-created content for document summarization and event detection |
publishDate |
2011 |
url |
http://hdl.handle.net/10356/44560 |
_version_ |
1759853880924962816 |