A study on content similarity between web pages
Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content s...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/14744 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-14744 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-147442023-07-07T17:30:24Z A study on content similarity between web pages He, Shanshan. Xiao Gaoxi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content similarity level and the inter-webpage hyperlinks. Such understanding may be of great help to the future developments of web crawlers. One major challenge was to design programs, which was new to the author. The author’s project will describe in full details how to design these two programs using C# programming language and Microsoft Visual Studio 2005. Three methods were use to test whether directly connected web pages really have a higher content similarity level than that of non-directly connect web pages. The results gathered from the tests were tabulated and graphically presented to illustrate better. Bachelor of Engineering 2009-01-30T08:09:24Z 2009-01-30T08:09:24Z 2008 2008 Final Year Project (FYP) http://hdl.handle.net/10356/14744 en 70 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems He, Shanshan. A study on content similarity between web pages |
description |
Searching for information in World Wide Web can be tedious sometime, even with the help of web search engine. In this project, the author will introduce the operation of web search engine, more focusing on web crawler. This project helps readers to understand of the correlation between the content similarity level and the inter-webpage hyperlinks. Such understanding may be of great help to the future developments of web crawlers.
One major challenge was to design programs, which was new to the author. The author’s project will describe in full details how to design these two programs using C# programming language and Microsoft Visual Studio 2005. Three methods were use to test whether directly connected web pages really have a higher content similarity level than that of non-directly connect web pages.
The results gathered from the tests were tabulated and graphically presented to illustrate better. |
author2 |
Xiao Gaoxi |
author_facet |
Xiao Gaoxi He, Shanshan. |
format |
Final Year Project |
author |
He, Shanshan. |
author_sort |
He, Shanshan. |
title |
A study on content similarity between web pages |
title_short |
A study on content similarity between web pages |
title_full |
A study on content similarity between web pages |
title_fullStr |
A study on content similarity between web pages |
title_full_unstemmed |
A study on content similarity between web pages |
title_sort |
study on content similarity between web pages |
publishDate |
2009 |
url |
http://hdl.handle.net/10356/14744 |
_version_ |
1772825649816272896 |