Event-driven data collection and summarization

The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared...

Full description

Saved in:
Bibliographic Details
Main Author: Zheng, Xin
Other Authors: Sun Aixin
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/89715
http://hdl.handle.net/10220/47146
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The online social networks have experienced an unprecedented proliferation. Various platforms change the way people learning about information, particularly on ongoing events, which can only be known from mainstream media in the past. The social media platforms have many pleasing properties compared with traditional media: convenient, detailed, fast and interactive. This provides us an opportunity to learn immediate and detailed information about event of interest, which is highly preferred by decision makers and the public especially when emergencies and significant events happen. Keyword search function is provided by social media platforms like Twitter for users to search messages containing the query keyword(s). However, the returned results are piecemeal due to length limitation, could be mixed with irrelevant tweets or are incomplete due to inappropriate query. This calls for research on collecting clean and complete event-related messages from social media platform. In this dissertation, the researches are conducted on Twitter platform. The collected event relevant tweets could be a large set. Presenting the data in a concise and representative form could help end users to have a general idea of the collected information. Therefore, after collecting event-related tweets, we aim to construct a summary for the large set of data. Collecting clean and complete event-related tweets from Twitter Stream is not a trivial problem. The challenges are as follows: (i) the great volume of tweets make the filtering process a heavy workload; (ii) tweets are short, noisy and with many abbreviations, dialects and misspellings which make it difficult to identify event-related messages, especially when we do not have enough training data for distinction; (iii) events are evolving and the collection should be adaptive to the development of events. The proposed methods in this dissertation deal with these challenges. As stated before, tweets are noisy and casually written. It is not suitable to extract tweets as summary directly. Therefore, we turn to well-written news articles linked by URLs in tweets, which should be of the same topic with event-related tweets. We call tweets with URLs linking to news as linking tweets. The news report the main information about the event of interest, while tweets could be diverse, including people’s focuses, comments, and other complementary information to news. We aim to construct a summary based on both the linked news articles and event-related tweets. The summary should not only highlight the key points in news articles but also address people’s focuses about the event. Peoples’ focuses are usually ignored when summarizing only news articles, but they are important when presenting the core information to people who care about the event. To sum up, in this dissertation, we propose approaches of collecting clean and complete event-related tweets from Twitter Stream, and unsupervised models on summarizing single and multiple news documents with linking tweets.