Event-driven data collection and summarization
The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/89715 http://hdl.handle.net/10220/47146 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-89715 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-897152020-06-23T10:42:47Z Event-driven data collection and summarization Zheng, Xin Sun Aixin School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared with traditional media: convenient, detailed, fast and interactive. This provides us an opportunity to learn immediate and detailed information about event of interest, which is highly preferred by decision makers and the public especially when emergencies and significant events happen. Keyword search function is provided by social media platforms like Twitter for users to search messages containing the query keyword(s). However, the returned results are piecemeal due to length limitation, could be mixed with irrelevant tweets or are incomplete due to inappropriate query. This calls for research on collecting clean and complete event-related messages from social media platform. In this dissertation, the researches are conducted on Twitter platform. The collected event relevant tweets could be a large set. Presenting the data in a concise and representative form could help end users to have a general idea of the collected information. Therefore, after collecting event-related tweets, we aim to construct a summary for the large set of data. Collecting clean and complete event-related tweets from Twitter Stream is not a trivial problem. The challenges are as follows: (i) the great volume of tweets make the filtering process a heavy workload; (ii) tweets are short, noisy and with many abbreviations, dialects and misspellings which make it difficult to identify event-related messages, especially when we do not have enough training data for distinction; (iii) events are evolving and the collection should be adaptive to the development of events. The proposed methods in this dissertation deal with these challenges. As stated before, tweets are noisy and casually written. It is not suitable to extract tweets as summary directly. Therefore, we turn to well-written news articles linked by URLs in tweets, which should be of the same topic with event-related tweets. We call tweets with URLs linking to news as linking tweets. The news report the main information about the event of interest, while tweets could be diverse, including people’s focuses, comments, and other complementary information to news. We aim to construct a summary based on both the linked news articles and event-related tweets. The summary should not only highlight the key points in news articles but also address people’s focuses about the event. Peoples’ focuses are usually ignored when summarizing only news articles, but they are important when presenting the core information to people who care about the event. To sum up, in this dissertation, we propose approaches of collecting clean and complete event-related tweets from Twitter Stream, and unsupervised models on summarizing single and multiple news documents with linking tweets. Doctor of Philosophy 2018-12-21T00:00:21Z 2019-12-06T17:31:50Z 2018-12-21T00:00:21Z 2019-12-06T17:31:50Z 2018 Thesis Zheng, X. (2018). Event-driven data collection and summarization. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/89715 http://hdl.handle.net/10220/47146 10.32657/10220/47146 en 168 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Zheng, Xin Event-driven data collection and summarization |
description |
The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared with traditional media: convenient, detailed, fast and interactive. This provides us an opportunity to learn immediate and detailed information about event of interest, which is highly preferred by decision makers and the public especially when emergencies and significant events happen.
Keyword search function is provided by social media platforms like Twitter for users to search messages containing the query keyword(s). However, the returned results are piecemeal due to length limitation, could be mixed with irrelevant tweets or are incomplete due to inappropriate query. This calls for research on collecting clean and complete event-related messages from social media platform. In this dissertation, the researches are conducted on Twitter platform. The collected event relevant tweets could be a large set. Presenting the data in a concise and representative form could help end users to have a general idea of the collected information. Therefore, after collecting event-related
tweets, we aim to construct a summary for the large set of data.
Collecting clean and complete event-related tweets from Twitter Stream is not a trivial problem. The challenges are as follows: (i) the great volume of tweets make the filtering process a heavy workload; (ii) tweets are short, noisy and with many abbreviations, dialects and misspellings which make it difficult to identify event-related messages, especially when we do not have enough training data for distinction; (iii) events are evolving and the collection should be adaptive to the development of events. The proposed methods in this dissertation deal with these challenges.
As stated before, tweets are noisy and casually written. It is not suitable to extract tweets as summary directly. Therefore, we turn to well-written news articles linked by URLs in tweets, which should be of the same topic with event-related tweets. We call tweets with URLs linking to news as linking tweets. The news report the main information about the event of interest, while tweets could be diverse, including people’s focuses, comments, and other complementary information to news. We aim to construct a summary based on both the linked news articles and event-related tweets. The summary should not only highlight the key points in news articles but also address people’s focuses about the event. Peoples’ focuses are usually ignored when summarizing only news articles, but they are important when presenting the core information to people who care about the event.
To sum up, in this dissertation, we propose approaches of collecting clean and complete event-related tweets from Twitter Stream, and unsupervised models on summarizing single and multiple news documents with linking tweets. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Zheng, Xin |
format |
Theses and Dissertations |
author |
Zheng, Xin |
author_sort |
Zheng, Xin |
title |
Event-driven data collection and summarization |
title_short |
Event-driven data collection and summarization |
title_full |
Event-driven data collection and summarization |
title_fullStr |
Event-driven data collection and summarization |
title_full_unstemmed |
Event-driven data collection and summarization |
title_sort |
event-driven data collection and summarization |
publishDate |
2018 |
url |
https://hdl.handle.net/10356/89715 http://hdl.handle.net/10220/47146 |
_version_ |
1681058112390299648 |