Harnessing online social media to deal with information overload.

In online social media, users become information creators and disseminators through the active interplay between information items and other users, instead of just being information consumers of a decade ago. This kind of information production and dissemination in collaborative and active manner fu...

Full description

Saved in:

Bibliographic Details
Main Author:	Li, Chenliang.
Other Authors:	Anwitaman Datta
Format:	Theses and Dissertations
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Online Access:	https://hdl.handle.net/10356/54827
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-54827
record_format	dspace
spelling	sg-ntu-dr.10356-548272023-03-04T00:41:45Z Harnessing online social media to deal with information overload. Li, Chenliang. Anwitaman Datta Sun Aixin School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications In online social media, users become information creators and disseminators through the active interplay between information items and other users, instead of just being information consumers of a decade ago. This kind of information production and dissemination in collaborative and active manner further aggravates the problem of information overload on the World Wide Web (WWW). The existing approaches for information retrieval (IR) and natural language processing (NLP) tasks often offer an intolerable response time for Web users. Moreover, given the numerous interactions between users and information items, new kinds of information needs are emerging, such as opinion mining, event detection and summarization, etc. However, the existing IR technologies (based on bag-of-word model), and NLP technologies (based on the linguistical features), often fail to satisfy the web users in these emerging information needs. On the other hand, people participate in online social media to share stories, photos with their friends, vote and leave opinions, or tag web pages, and so on. The digital footprints of these behaviors make online social media semantic resources which we can exploit to better understand and organize the astronomical information. In this dissertation, we first analyze online social media as multi-dimensional social network by taking Wikipedia as a case study. We find that given the multiple relations exposed from different perspectives in the network, focusing on only one specific relation could lead to biased or even wrong conclusion. Traditional information retrieval approaches are mainly bag-of-word model and keyword based, which ignore the word ordering in the text and measure the relevance based on the presence of the keywords. We propose a generalized framework for word sense disambiguation based on Wikipedia. The proposed framework can enable effective and efficient disambiguation by relating keyphrases (i.e., n-grams) in the documents to their appropriate concepts in Wikipedia, where a concept is defined as a Wikipedia article. The framework is applicable to the documents of different languages with different settings. By adopting the disambiguation method, we could represent a textual document by the concepts it covers based on Wikipedia. We study the semantic tag recommendation task for web pages based on the concept model by exploring the semantic relations between tags and concepts underlying human annotation activities. Web users participate in the information generation process by commenting news articles, sharing stories and publishing opinions by posting microblogs, etc. However, the information generated by users are often short and written with free style, containing grammatical errors, informal abbreviations (e.g., comments, tweets). These adverse features deteriorate the performance of the existing algorithms for many tasks for online social media, such as named entity recognition, event detection, etc. We propose an unsupervised approach for named entity recognition in targeted Twitter stream. Within this work, we develop an algorithm of tweet segmentation which splits each tweet into non-overlapping phrases, called tweet segments. Inspired by the semantic units produced by tweet segmentation, we further propose an algorithm for event detection for tweets based on tweet segments, which is effective and scalable. DOCTOR OF PHILOSOPHY (SCE) 2013-08-30T03:28:43Z 2013-08-30T03:28:43Z 2013 2013 Thesis Li, C. (2013). Harnessing online social media to deal with information overload. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/54827 10.32657/10356/54827 en 201 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications Li, Chenliang. Harnessing online social media to deal with information overload.
description	In online social media, users become information creators and disseminators through the active interplay between information items and other users, instead of just being information consumers of a decade ago. This kind of information production and dissemination in collaborative and active manner further aggravates the problem of information overload on the World Wide Web (WWW). The existing approaches for information retrieval (IR) and natural language processing (NLP) tasks often offer an intolerable response time for Web users. Moreover, given the numerous interactions between users and information items, new kinds of information needs are emerging, such as opinion mining, event detection and summarization, etc. However, the existing IR technologies (based on bag-of-word model), and NLP technologies (based on the linguistical features), often fail to satisfy the web users in these emerging information needs. On the other hand, people participate in online social media to share stories, photos with their friends, vote and leave opinions, or tag web pages, and so on. The digital footprints of these behaviors make online social media semantic resources which we can exploit to better understand and organize the astronomical information. In this dissertation, we first analyze online social media as multi-dimensional social network by taking Wikipedia as a case study. We find that given the multiple relations exposed from different perspectives in the network, focusing on only one specific relation could lead to biased or even wrong conclusion. Traditional information retrieval approaches are mainly bag-of-word model and keyword based, which ignore the word ordering in the text and measure the relevance based on the presence of the keywords. We propose a generalized framework for word sense disambiguation based on Wikipedia. The proposed framework can enable effective and efficient disambiguation by relating keyphrases (i.e., n-grams) in the documents to their appropriate concepts in Wikipedia, where a concept is defined as a Wikipedia article. The framework is applicable to the documents of different languages with different settings. By adopting the disambiguation method, we could represent a textual document by the concepts it covers based on Wikipedia. We study the semantic tag recommendation task for web pages based on the concept model by exploring the semantic relations between tags and concepts underlying human annotation activities. Web users participate in the information generation process by commenting news articles, sharing stories and publishing opinions by posting microblogs, etc. However, the information generated by users are often short and written with free style, containing grammatical errors, informal abbreviations (e.g., comments, tweets). These adverse features deteriorate the performance of the existing algorithms for many tasks for online social media, such as named entity recognition, event detection, etc. We propose an unsupervised approach for named entity recognition in targeted Twitter stream. Within this work, we develop an algorithm of tweet segmentation which splits each tweet into non-overlapping phrases, called tweet segments. Inspired by the semantic units produced by tweet segmentation, we further propose an algorithm for event detection for tweets based on tweet segments, which is effective and scalable.
author2	Anwitaman Datta
author_facet	Anwitaman Datta Li, Chenliang.
format	Theses and Dissertations
author	Li, Chenliang.
author_sort	Li, Chenliang.
title	Harnessing online social media to deal with information overload.
title_short	Harnessing online social media to deal with information overload.
title_full	Harnessing online social media to deal with information overload.
title_fullStr	Harnessing online social media to deal with information overload.
title_full_unstemmed	Harnessing online social media to deal with information overload.
title_sort	harnessing online social media to deal with information overload.
publishDate	2013
url	https://hdl.handle.net/10356/54827
_version_	1759856584334245888

Harnessing online social media to deal with information overload.

Similar Items