Event-driven data collection and summarization

The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared...

Full description

Saved in:
Bibliographic Details
Main Author: Zheng, Xin
Other Authors: Sun Aixin
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/89715
http://hdl.handle.net/10220/47146
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-89715
record_format dspace
spelling sg-ntu-dr.10356-897152020-06-23T10:42:47Z Event-driven data collection and summarization Zheng, Xin Sun Aixin School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared with traditional media: convenient, detailed, fast and interactive. This provides us an opportunity to learn immediate and detailed information about event of interest, which is highly preferred by decision makers and the public especially when emergencies and significant events happen. Keyword search function is provided by social media platforms like Twitter for users to search messages containing the query keyword(s). However, the returned results are piecemeal due to length limitation, could be mixed with irrelevant tweets or are incomplete due to inappropriate query. This calls for research on collecting clean and complete event-related messages from social media platform. In this dissertation, the researches are conducted on Twitter platform. The collected event relevant tweets could be a large set. Presenting the data in a concise and representative form could help end users to have a general idea of the collected information. Therefore, after collecting event-related tweets, we aim to construct a summary for the large set of data. Collecting clean and complete event-related tweets from Twitter Stream is not a trivial problem. The challenges are as follows: (i) the great volume of tweets make the filtering process a heavy workload; (ii) tweets are short, noisy and with many abbreviations, dialects and misspellings which make it difficult to identify event-related messages, especially when we do not have enough training data for distinction; (iii) events are evolving and the collection should be adaptive to the development of events. The proposed methods in this dissertation deal with these challenges. As stated before, tweets are noisy and casually written. It is not suitable to extract tweets as summary directly. Therefore, we turn to well-written news articles linked by URLs in tweets, which should be of the same topic with event-related tweets. We call tweets with URLs linking to news as linking tweets. The news report the main information about the event of interest, while tweets could be diverse, including people’s focuses, comments, and other complementary information to news. We aim to construct a summary based on both the linked news articles and event-related tweets. The summary should not only highlight the key points in news articles but also address people’s focuses about the event. Peoples’ focuses are usually ignored when summarizing only news articles, but they are important when presenting the core information to people who care about the event. To sum up, in this dissertation, we propose approaches of collecting clean and complete event-related tweets from Twitter Stream, and unsupervised models on summarizing single and multiple news documents with linking tweets. Doctor of Philosophy 2018-12-21T00:00:21Z 2019-12-06T17:31:50Z 2018-12-21T00:00:21Z 2019-12-06T17:31:50Z 2018 Thesis Zheng, X. (2018). Event-driven data collection and summarization. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/89715 http://hdl.handle.net/10220/47146 10.32657/10220/47146 en 168 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Zheng, Xin
Event-driven data collection and summarization
description The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared with traditional media: convenient, detailed, fast and interactive. This provides us an opportunity to learn immediate and detailed information about event of interest, which is highly preferred by decision makers and the public especially when emergencies and significant events happen. Keyword search function is provided by social media platforms like Twitter for users to search messages containing the query keyword(s). However, the returned results are piecemeal due to length limitation, could be mixed with irrelevant tweets or are incomplete due to inappropriate query. This calls for research on collecting clean and complete event-related messages from social media platform. In this dissertation, the researches are conducted on Twitter platform. The collected event relevant tweets could be a large set. Presenting the data in a concise and representative form could help end users to have a general idea of the collected information. Therefore, after collecting event-related tweets, we aim to construct a summary for the large set of data. Collecting clean and complete event-related tweets from Twitter Stream is not a trivial problem. The challenges are as follows: (i) the great volume of tweets make the filtering process a heavy workload; (ii) tweets are short, noisy and with many abbreviations, dialects and misspellings which make it difficult to identify event-related messages, especially when we do not have enough training data for distinction; (iii) events are evolving and the collection should be adaptive to the development of events. The proposed methods in this dissertation deal with these challenges. As stated before, tweets are noisy and casually written. It is not suitable to extract tweets as summary directly. Therefore, we turn to well-written news articles linked by URLs in tweets, which should be of the same topic with event-related tweets. We call tweets with URLs linking to news as linking tweets. The news report the main information about the event of interest, while tweets could be diverse, including people’s focuses, comments, and other complementary information to news. We aim to construct a summary based on both the linked news articles and event-related tweets. The summary should not only highlight the key points in news articles but also address people’s focuses about the event. Peoples’ focuses are usually ignored when summarizing only news articles, but they are important when presenting the core information to people who care about the event. To sum up, in this dissertation, we propose approaches of collecting clean and complete event-related tweets from Twitter Stream, and unsupervised models on summarizing single and multiple news documents with linking tweets.
author2 Sun Aixin
author_facet Sun Aixin
Zheng, Xin
format Theses and Dissertations
author Zheng, Xin
author_sort Zheng, Xin
title Event-driven data collection and summarization
title_short Event-driven data collection and summarization
title_full Event-driven data collection and summarization
title_fullStr Event-driven data collection and summarization
title_full_unstemmed Event-driven data collection and summarization
title_sort event-driven data collection and summarization
publishDate 2018
url https://hdl.handle.net/10356/89715
http://hdl.handle.net/10220/47146
_version_ 1681058112390299648