Connections and content similarity between webpages

Now a day the number of website are significantly growing day by day and the internet traffic also getting more and more complicated. Understanding how the websites are linked each other, analysing the content of the website become getting more important when browsing in the web. The main objecti...

Full description

Saved in:

Bibliographic Details
Main Author:	Cung, Boi Thawng
Other Authors:	Xiao Gaoxi
Format:	Final Year Project
Language:	English
Published:	2010
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	http://hdl.handle.net/10356/40382
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-40382
record_format	dspace
spelling	sg-ntu-dr.10356-403822023-07-07T17:08:13Z Connections and content similarity between webpages Cung, Boi Thawng Xiao Gaoxi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Now a day the number of website are significantly growing day by day and the internet traffic also getting more and more complicated. Understanding how the websites are linked each other, analysing the content of the website become getting more important when browsing in the web. The main objective of this project is to understand how the web pages are interlinked the website and their content similarity. This kind of understanding would be helpful for the future development of efficient web crawlers. In order to do this project, software which can observe or calculate the content similarity of the web pages have to be implemented. Therefore the whole project can be categorized in to implementation and Testing or analyzing the web pages. Firstly, Web Content Analyzer is implemented. Web content analyzer is user interface and which can be used to perform multiple functions. By using web content analyzer, all the web pages in the website can be list out folder by folder, so that the user can easily select any two web pages to compare. Another important feature of web content analyzer is that it can randomly generates any two web pages from the website. Web content analyzer can be used to calculate the mean value and standard deviation. The content similarity result can be store in the database through the web content analyzer. To perform these features, the whole project is equipped with Apache for web server; MySQL for database; Dreamweaver CS4 for creating webpage; PHP, JavaScripts for programming code; Compare Suite for comparing the web pages. The frame of the main interface web page is designed using Macromedia Dreamweaver and; PHP code for calculating mean and standard deviation was embedded. In the PHP code, SQL command are embedded to communicate with database. After implementation, three hundred pairs of web pages have had been selected in three methods, and calculated the mean value and standard variation of content similarity. The testing result helps to understand the interconnections and the content similarity of the NTU website. Bachelor of Engineering 2010-06-15T04:16:10Z 2010-06-15T04:16:10Z 2010 2010 Final Year Project (FYP) http://hdl.handle.net/10356/40382 en Nanyang Technological University 68 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Cung, Boi Thawng Connections and content similarity between webpages
description	Now a day the number of website are significantly growing day by day and the internet traffic also getting more and more complicated. Understanding how the websites are linked each other, analysing the content of the website become getting more important when browsing in the web. The main objective of this project is to understand how the web pages are interlinked the website and their content similarity. This kind of understanding would be helpful for the future development of efficient web crawlers. In order to do this project, software which can observe or calculate the content similarity of the web pages have to be implemented. Therefore the whole project can be categorized in to implementation and Testing or analyzing the web pages. Firstly, Web Content Analyzer is implemented. Web content analyzer is user interface and which can be used to perform multiple functions. By using web content analyzer, all the web pages in the website can be list out folder by folder, so that the user can easily select any two web pages to compare. Another important feature of web content analyzer is that it can randomly generates any two web pages from the website. Web content analyzer can be used to calculate the mean value and standard deviation. The content similarity result can be store in the database through the web content analyzer. To perform these features, the whole project is equipped with Apache for web server; MySQL for database; Dreamweaver CS4 for creating webpage; PHP, JavaScripts for programming code; Compare Suite for comparing the web pages. The frame of the main interface web page is designed using Macromedia Dreamweaver and; PHP code for calculating mean and standard deviation was embedded. In the PHP code, SQL command are embedded to communicate with database. After implementation, three hundred pairs of web pages have had been selected in three methods, and calculated the mean value and standard variation of content similarity. The testing result helps to understand the interconnections and the content similarity of the NTU website.
author2	Xiao Gaoxi
author_facet	Xiao Gaoxi Cung, Boi Thawng
format	Final Year Project
author	Cung, Boi Thawng
author_sort	Cung, Boi Thawng
title	Connections and content similarity between webpages
title_short	Connections and content similarity between webpages
title_full	Connections and content similarity between webpages
title_fullStr	Connections and content similarity between webpages
title_full_unstemmed	Connections and content similarity between webpages
title_sort	connections and content similarity between webpages
publishDate	2010
url	http://hdl.handle.net/10356/40382
_version_	1772825791427510272

Connections and content similarity between webpages

Similar Items