High performance data processing systems in Clouds

A web crawler is capable to surf net and traverse among hyperlinks that it links to. Large amount of data is linked up, process of traversing though these links allowed us to gained data from a webpage to another. Crawling and collecting Portable Document Format (PDF) is the main task in this projec...

Full description

Saved in:
Bibliographic Details
Main Author: Lai, Qi Rong
Other Authors: He Bingsheng
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/63058
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-63058
record_format dspace
spelling sg-ntu-dr.10356-630582023-03-03T20:30:09Z High performance data processing systems in Clouds Lai, Qi Rong He Bingsheng School of Computer Engineering DRNTU::Engineering::Computer science and engineering A web crawler is capable to surf net and traverse among hyperlinks that it links to. Large amount of data is linked up, process of traversing though these links allowed us to gained data from a webpage to another. Crawling and collecting Portable Document Format (PDF) is the main task in this project. PDF is in unstructured data form. Collected PDFs are to be processed and compiled into a more visual friendly form of image, words cloud. Words cloud is created based on words within PDF and is assigned scaling in font size to represent the importance of that word according to frequency in PDF. Word cloud is said to be able to represent the content of a PDF, since most frequent word represent the larger portion within it. Using steganography to concealed message of the words frequency into word cloud image created. This method able to generate a look alike (in human eyes) image which embedded with information. Using the information within word cloud able to be retrieved and compiled to craft the similarity of different words clouds. Bachelor of Engineering (Computer Science) 2015-05-05T07:57:56Z 2015-05-05T07:57:56Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/63058 en Nanyang Technological University 51 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Lai, Qi Rong
High performance data processing systems in Clouds
description A web crawler is capable to surf net and traverse among hyperlinks that it links to. Large amount of data is linked up, process of traversing though these links allowed us to gained data from a webpage to another. Crawling and collecting Portable Document Format (PDF) is the main task in this project. PDF is in unstructured data form. Collected PDFs are to be processed and compiled into a more visual friendly form of image, words cloud. Words cloud is created based on words within PDF and is assigned scaling in font size to represent the importance of that word according to frequency in PDF. Word cloud is said to be able to represent the content of a PDF, since most frequent word represent the larger portion within it. Using steganography to concealed message of the words frequency into word cloud image created. This method able to generate a look alike (in human eyes) image which embedded with information. Using the information within word cloud able to be retrieved and compiled to craft the similarity of different words clouds.
author2 He Bingsheng
author_facet He Bingsheng
Lai, Qi Rong
format Final Year Project
author Lai, Qi Rong
author_sort Lai, Qi Rong
title High performance data processing systems in Clouds
title_short High performance data processing systems in Clouds
title_full High performance data processing systems in Clouds
title_fullStr High performance data processing systems in Clouds
title_full_unstemmed High performance data processing systems in Clouds
title_sort high performance data processing systems in clouds
publishDate 2015
url http://hdl.handle.net/10356/63058
_version_ 1759856251565506560