Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70109 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-70109 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-701092023-03-03T20:23:18Z Tools for analysis of large-scale networks (I) algorithms, analytics and visualization Chua, Chee Ann Hsu Wen Jing School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself. To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information. Bachelor of Engineering (Computer Science) 2017-04-11T08:56:55Z 2017-04-11T08:56:55Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/70109 en Nanyang Technological University 51 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Chua, Chee Ann Tools for analysis of large-scale networks (I) algorithms, analytics and visualization |
description |
Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data.
This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself.
To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information. |
author2 |
Hsu Wen Jing |
author_facet |
Hsu Wen Jing Chua, Chee Ann |
format |
Final Year Project |
author |
Chua, Chee Ann |
author_sort |
Chua, Chee Ann |
title |
Tools for analysis of large-scale networks (I) algorithms, analytics and visualization |
title_short |
Tools for analysis of large-scale networks (I) algorithms, analytics and visualization |
title_full |
Tools for analysis of large-scale networks (I) algorithms, analytics and visualization |
title_fullStr |
Tools for analysis of large-scale networks (I) algorithms, analytics and visualization |
title_full_unstemmed |
Tools for analysis of large-scale networks (I) algorithms, analytics and visualization |
title_sort |
tools for analysis of large-scale networks (i) algorithms, analytics and visualization |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/70109 |
_version_ |
1759855386662273024 |