Query-based text extraction algorithm for web pages.

The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using t...

Full description

Saved in:

Bibliographic Details
Main Author:	New, Chin Ker.
Other Authors:	Khoo, Christopher Soo Guan
Format:	Theses and Dissertations
Published:	2008
Subjects:	DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies
Online Access:	http://hdl.handle.net/10356/1632
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University

id	sg-ntu-dr.10356-1632
record_format	dspace
spelling	sg-ntu-dr.10356-16322019-12-10T14:36:59Z Query-based text extraction algorithm for web pages. New, Chin Ker. Khoo, Christopher Soo Guan Wee Kim Wee School of Communication and Information DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using the AltaVista Search Engine. The development of this algorithm was based on sentence weight (through simple calculation), cue words, location of the sentence and the application of canned abstracts. To test out the new algorithm, a total of 50 Web pages (from 10 different subject areas) were retrieved from the Internet through AltaVista Search Engine. The abstracts of these Web pages were then generated by hand by simulating the new algorithm. Master of Science (Information Studies) 2008-09-10T08:34:53Z 2008-09-10T08:34:53Z 2000 2000 Thesis http://hdl.handle.net/10356/1632 Nanyang Technological University application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
topic	DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies
spellingShingle	DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies New, Chin Ker. Query-based text extraction algorithm for web pages.
description	The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using the AltaVista Search Engine. The development of this algorithm was based on sentence weight (through simple calculation), cue words, location of the sentence and the application of canned abstracts. To test out the new algorithm, a total of 50 Web pages (from 10 different subject areas) were retrieved from the Internet through AltaVista Search Engine. The abstracts of these Web pages were then generated by hand by simulating the new algorithm.
author2	Khoo, Christopher Soo Guan
author_facet	Khoo, Christopher Soo Guan New, Chin Ker.
format	Theses and Dissertations
author	New, Chin Ker.
author_sort	New, Chin Ker.
title	Query-based text extraction algorithm for web pages.
title_short	Query-based text extraction algorithm for web pages.
title_full	Query-based text extraction algorithm for web pages.
title_fullStr	Query-based text extraction algorithm for web pages.
title_full_unstemmed	Query-based text extraction algorithm for web pages.
title_sort	query-based text extraction algorithm for web pages.
publishDate	2008
url	http://hdl.handle.net/10356/1632
_version_	1681049025995866112

Query-based text extraction algorithm for web pages.

Similar Items