A study on content similarity between web pages

Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content s...

Full description

Saved in:
Bibliographic Details
Main Author: He, Shanshan.
Other Authors: Xiao Gaoxi
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/14744
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content similarity level and the inter-webpage hyperlinks. Such understanding may be of great help to the future developments of web crawlers. One major challenge was to design programs, which was new to the author. The author’s project will describe in full details how to design these two programs using C# programming language and Microsoft Visual Studio 2005. Three methods were use to test whether directly connected web pages really have a higher content similarity level than that of non-directly connect web pages. The results gathered from the tests were tabulated and graphically presented to illustrate better.