Tools for analysis of large-scale networks (I) algorithms, analytics and visualization

Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-...

Full description

Saved in:
Bibliographic Details
Main Author: Chua, Chee Ann
Other Authors: Hsu Wen Jing
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70109
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-70109
record_format dspace
spelling sg-ntu-dr.10356-701092023-03-03T20:23:18Z Tools for analysis of large-scale networks (I) algorithms, analytics and visualization Chua, Chee Ann Hsu Wen Jing School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself. To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information. Bachelor of Engineering (Computer Science) 2017-04-11T08:56:55Z 2017-04-11T08:56:55Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/70109 en Nanyang Technological University 51 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Chua, Chee Ann
Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
description Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself. To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information.
author2 Hsu Wen Jing
author_facet Hsu Wen Jing
Chua, Chee Ann
format Final Year Project
author Chua, Chee Ann
author_sort Chua, Chee Ann
title Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
title_short Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
title_full Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
title_fullStr Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
title_full_unstemmed Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
title_sort tools for analysis of large-scale networks (i) algorithms, analytics and visualization
publishDate 2017
url http://hdl.handle.net/10356/70109
_version_ 1759855386662273024