Connections and content similarity between webpages
Now a day the number of website are significantly growing day by day and the internet traffic also getting more and more complicated. Understanding how the websites are linked each other, analysing the content of the website become getting more important when browsing in the web. The main objecti...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/40382 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Now a day the number of website are significantly growing day by day and the
internet traffic also getting more and more complicated. Understanding how the
websites are linked each other, analysing the content of the website become getting
more important when browsing in the web. The main objective of this project is to
understand how the web pages are interlinked the website and their content similarity.
This kind of understanding would be helpful for the future development of efficient
web crawlers. In order to do this project, software which can observe or calculate the
content similarity of the web pages have to be implemented. Therefore the whole
project can be categorized in to implementation and Testing or analyzing the web pages. Firstly, Web Content Analyzer is implemented. Web content analyzer is user
interface and which can be used to perform multiple functions. By using web content
analyzer, all the web pages in the website can be list out folder by folder, so that the
user can easily select any two web pages to compare. Another important feature of
web content analyzer is that it can randomly generates any two web pages from the
website. Web content analyzer can be used to calculate the mean value and standard
deviation. The content similarity result can be store in the database through the web
content analyzer. To perform these features, the whole project is equipped with
Apache for web server; MySQL for database; Dreamweaver CS4 for creating webpage; PHP, JavaScripts for programming code; Compare Suite for comparing the web pages. The frame of the main interface web page is designed using Macromedia Dreamweaver and; PHP code for calculating mean and standard deviation was embedded. In the PHP code, SQL command are embedded to communicate with
database. After implementation, three hundred pairs of web pages have had been selected in three methods, and calculated the mean value and standard variation of content similarity. The testing result helps to understand the interconnections and the content similarity of the NTU website. |
---|