Online tweet summarization : a topic modelling-based approach
Twitter is an online social networking service, in which users post short messages called “tweets”. Twitter users can follow other users, forming a network whereby a user receives all the tweets posted by the users that he/she follows. Similar to many other social networking services such as Faceboo...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66948 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Twitter is an online social networking service, in which users post short messages called “tweets”. Twitter users can follow other users, forming a network whereby a user receives all the tweets posted by the users that he/she follows. Similar to many other social networking services such as Facebook and Instagram, Twitter has adopted a reverse chronological timeline since its release. The reverse chronological timeline is inadequate due to two main reasons: (a) the most recent posts could be repeating the same information, and (b) it can be relatively difficult for the users to see the overall picture of the topics being discussed in the entire collection of most recent posts.
To overcome the limitations of the reverse chronological timeline, we present an alternative approach based on topic modelling in this project. Topic modelling is a text mining technique used to identify hidden topics from a collection of text documents, and we adopted the most basic and widely used topic model based on the Latent Dirichlet Allocation for the project. Beyond identifying the most salient topics in a collection of tweets, we also examined and proposed solutions to issues such as ranking of the tweets based on its relevance to the topic, as well as the generation of topic labels and topic summaries.
A pilot user study involving 20 participants was conducted to evaluate the performance of the proposed solution. The findings from the user study show that the topic modelling-based approach outperforms the reverse chronological baseline in many areas, and highlights the feasibility of the topic modelling-based approach. The user study has also helped to identify areas of future work which can help to further enhance the proposed solution. |
---|