Harnessing online social media to deal with information overload.

In online social media, users become information creators and disseminators through the active interplay between information items and other users, instead of just being information consumers of a decade ago. This kind of information production and dissemination in collaborative and active manner fu...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Chenliang.
Other Authors: Anwitaman Datta
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/54827
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-54827
record_format dspace
spelling sg-ntu-dr.10356-548272023-03-04T00:41:45Z Harnessing online social media to deal with information overload. Li, Chenliang. Anwitaman Datta Sun Aixin School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications In online social media, users become information creators and disseminators through the active interplay between information items and other users, instead of just being information consumers of a decade ago. This kind of information production and dissemination in collaborative and active manner further aggravates the problem of information overload on the World Wide Web (WWW). The existing approaches for information retrieval (IR) and natural language processing (NLP) tasks often offer an intolerable response time for Web users. Moreover, given the numerous interactions between users and information items, new kinds of information needs are emerging, such as opinion mining, event detection and summarization, etc. However, the existing IR technologies (based on bag-of-word model), and NLP technologies (based on the linguistical features), often fail to satisfy the web users in these emerging information needs. On the other hand, people participate in online social media to share stories, photos with their friends, vote and leave opinions, or tag web pages, and so on. The digital footprints of these behaviors make online social media semantic resources which we can exploit to better understand and organize the astronomical information. In this dissertation, we first analyze online social media as multi-dimensional social network by taking Wikipedia as a case study. We find that given the multiple relations exposed from different perspectives in the network, focusing on only one specific relation could lead to biased or even wrong conclusion. Traditional information retrieval approaches are mainly bag-of-word model and keyword based, which ignore the word ordering in the text and measure the relevance based on the presence of the keywords. We propose a generalized framework for word sense disambiguation based on Wikipedia. The proposed framework can enable effective and efficient disambiguation by relating keyphrases (i.e., n-grams) in the documents to their appropriate concepts in Wikipedia, where a concept is defined as a Wikipedia article. The framework is applicable to the documents of different languages with different settings. By adopting the disambiguation method, we could represent a textual document by the concepts it covers based on Wikipedia. We study the semantic tag recommendation task for web pages based on the concept model by exploring the semantic relations between tags and concepts underlying human annotation activities. Web users participate in the information generation process by commenting news articles, sharing stories and publishing opinions by posting microblogs, etc. However, the information generated by users are often short and written with free style, containing grammatical errors, informal abbreviations (e.g., comments, tweets). These adverse features deteriorate the performance of the existing algorithms for many tasks for online social media, such as named entity recognition, event detection, etc. We propose an unsupervised approach for named entity recognition in targeted Twitter stream. Within this work, we develop an algorithm of tweet segmentation which splits each tweet into non-overlapping phrases, called tweet segments. Inspired by the semantic units produced by tweet segmentation, we further propose an algorithm for event detection for tweets based on tweet segments, which is effective and scalable. DOCTOR OF PHILOSOPHY (SCE) 2013-08-30T03:28:43Z 2013-08-30T03:28:43Z 2013 2013 Thesis Li, C. (2013). Harnessing online social media to deal with information overload. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/54827 10.32657/10356/54827 en 201 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Li, Chenliang.
Harnessing online social media to deal with information overload.
description In online social media, users become information creators and disseminators through the active interplay between information items and other users, instead of just being information consumers of a decade ago. This kind of information production and dissemination in collaborative and active manner further aggravates the problem of information overload on the World Wide Web (WWW). The existing approaches for information retrieval (IR) and natural language processing (NLP) tasks often offer an intolerable response time for Web users. Moreover, given the numerous interactions between users and information items, new kinds of information needs are emerging, such as opinion mining, event detection and summarization, etc. However, the existing IR technologies (based on bag-of-word model), and NLP technologies (based on the linguistical features), often fail to satisfy the web users in these emerging information needs. On the other hand, people participate in online social media to share stories, photos with their friends, vote and leave opinions, or tag web pages, and so on. The digital footprints of these behaviors make online social media semantic resources which we can exploit to better understand and organize the astronomical information. In this dissertation, we first analyze online social media as multi-dimensional social network by taking Wikipedia as a case study. We find that given the multiple relations exposed from different perspectives in the network, focusing on only one specific relation could lead to biased or even wrong conclusion. Traditional information retrieval approaches are mainly bag-of-word model and keyword based, which ignore the word ordering in the text and measure the relevance based on the presence of the keywords. We propose a generalized framework for word sense disambiguation based on Wikipedia. The proposed framework can enable effective and efficient disambiguation by relating keyphrases (i.e., n-grams) in the documents to their appropriate concepts in Wikipedia, where a concept is defined as a Wikipedia article. The framework is applicable to the documents of different languages with different settings. By adopting the disambiguation method, we could represent a textual document by the concepts it covers based on Wikipedia. We study the semantic tag recommendation task for web pages based on the concept model by exploring the semantic relations between tags and concepts underlying human annotation activities. Web users participate in the information generation process by commenting news articles, sharing stories and publishing opinions by posting microblogs, etc. However, the information generated by users are often short and written with free style, containing grammatical errors, informal abbreviations (e.g., comments, tweets). These adverse features deteriorate the performance of the existing algorithms for many tasks for online social media, such as named entity recognition, event detection, etc. We propose an unsupervised approach for named entity recognition in targeted Twitter stream. Within this work, we develop an algorithm of tweet segmentation which splits each tweet into non-overlapping phrases, called tweet segments. Inspired by the semantic units produced by tweet segmentation, we further propose an algorithm for event detection for tweets based on tweet segments, which is effective and scalable.
author2 Anwitaman Datta
author_facet Anwitaman Datta
Li, Chenliang.
format Theses and Dissertations
author Li, Chenliang.
author_sort Li, Chenliang.
title Harnessing online social media to deal with information overload.
title_short Harnessing online social media to deal with information overload.
title_full Harnessing online social media to deal with information overload.
title_fullStr Harnessing online social media to deal with information overload.
title_full_unstemmed Harnessing online social media to deal with information overload.
title_sort harnessing online social media to deal with information overload.
publishDate 2013
url https://hdl.handle.net/10356/54827
_version_ 1759856584334245888