A study on content similarity between web pages

Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content s...

Full description

Saved in:
Bibliographic Details
Main Author: He, Shanshan.
Other Authors: Xiao Gaoxi
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/14744
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-14744
record_format dspace
spelling sg-ntu-dr.10356-147442023-07-07T17:30:24Z A study on content similarity between web pages He, Shanshan. Xiao Gaoxi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content similarity level and the inter-webpage hyperlinks. Such understanding may be of great help to the future developments of web crawlers. One major challenge was to design programs, which was new to the author. The author’s project will describe in full details how to design these two programs using C# programming language and Microsoft Visual Studio 2005. Three methods were use to test whether directly connected web pages really have a higher content similarity level than that of non-directly connect web pages. The results gathered from the tests were tabulated and graphically presented to illustrate better. Bachelor of Engineering 2009-01-30T08:09:24Z 2009-01-30T08:09:24Z 2008 2008 Final Year Project (FYP) http://hdl.handle.net/10356/14744 en 70 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
He, Shanshan.
A study on content similarity between web pages
description Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content similarity level and the inter-webpage hyperlinks. Such understanding may be of great help to the future developments of web crawlers. One major challenge was to design programs, which was new to the author. The author’s project will describe in full details how to design these two programs using C# programming language and Microsoft Visual Studio 2005. Three methods were use to test whether directly connected web pages really have a higher content similarity level than that of non-directly connect web pages. The results gathered from the tests were tabulated and graphically presented to illustrate better.
author2 Xiao Gaoxi
author_facet Xiao Gaoxi
He, Shanshan.
format Final Year Project
author He, Shanshan.
author_sort He, Shanshan.
title A study on content similarity between web pages
title_short A study on content similarity between web pages
title_full A study on content similarity between web pages
title_fullStr A study on content similarity between web pages
title_full_unstemmed A study on content similarity between web pages
title_sort study on content similarity between web pages
publishDate 2009
url http://hdl.handle.net/10356/14744
_version_ 1772825649816272896