High performance data processing systems in Clouds
A web crawler is capable to surf net and traverse among hyperlinks that it links to. Large amount of data is linked up, process of traversing though these links allowed us to gained data from a webpage to another. Crawling and collecting Portable Document Format (PDF) is the main task in this projec...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/63058 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-63058 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-630582023-03-03T20:30:09Z High performance data processing systems in Clouds Lai, Qi Rong He Bingsheng School of Computer Engineering DRNTU::Engineering::Computer science and engineering A web crawler is capable to surf net and traverse among hyperlinks that it links to. Large amount of data is linked up, process of traversing though these links allowed us to gained data from a webpage to another. Crawling and collecting Portable Document Format (PDF) is the main task in this project. PDF is in unstructured data form. Collected PDFs are to be processed and compiled into a more visual friendly form of image, words cloud. Words cloud is created based on words within PDF and is assigned scaling in font size to represent the importance of that word according to frequency in PDF. Word cloud is said to be able to represent the content of a PDF, since most frequent word represent the larger portion within it. Using steganography to concealed message of the words frequency into word cloud image created. This method able to generate a look alike (in human eyes) image which embedded with information. Using the information within word cloud able to be retrieved and compiled to craft the similarity of different words clouds. Bachelor of Engineering (Computer Science) 2015-05-05T07:57:56Z 2015-05-05T07:57:56Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/63058 en Nanyang Technological University 51 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Lai, Qi Rong High performance data processing systems in Clouds |
description |
A web crawler is capable to surf net and traverse among hyperlinks that it links to. Large amount of data is linked up, process of traversing though these links allowed us to gained data from a webpage to another. Crawling and collecting Portable Document Format (PDF) is the main task in this project. PDF is in unstructured data form. Collected PDFs are to be processed and compiled into a more visual friendly form of image, words cloud. Words cloud is created based on words within PDF and is assigned scaling in font size to represent the importance of that word according to frequency in PDF. Word cloud is said to be able to represent the content of a PDF, since most frequent word represent the larger portion within it. Using steganography to concealed message of the words frequency into word cloud image created. This method able to generate a look alike (in human eyes) image which embedded with information. Using the information within word cloud able to be retrieved and compiled to craft the similarity of different words clouds. |
author2 |
He Bingsheng |
author_facet |
He Bingsheng Lai, Qi Rong |
format |
Final Year Project |
author |
Lai, Qi Rong |
author_sort |
Lai, Qi Rong |
title |
High performance data processing systems in Clouds |
title_short |
High performance data processing systems in Clouds |
title_full |
High performance data processing systems in Clouds |
title_fullStr |
High performance data processing systems in Clouds |
title_full_unstemmed |
High performance data processing systems in Clouds |
title_sort |
high performance data processing systems in clouds |
publishDate |
2015 |
url |
http://hdl.handle.net/10356/63058 |
_version_ |
1759856251565506560 |