Query-based text extraction algorithm for web pages.

The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using t...

Full description

Saved in:
Bibliographic Details
Main Author: New, Chin Ker.
Other Authors: Khoo, Christopher Soo Guan
Format: Theses and Dissertations
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10356/1632
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
id sg-ntu-dr.10356-1632
record_format dspace
spelling sg-ntu-dr.10356-16322019-12-10T14:36:59Z Query-based text extraction algorithm for web pages. New, Chin Ker. Khoo, Christopher Soo Guan Wee Kim Wee School of Communication and Information DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using the AltaVista Search Engine. The development of this algorithm was based on sentence weight (through simple calculation), cue words, location of the sentence and the application of canned abstracts. To test out the new algorithm, a total of 50 Web pages (from 10 different subject areas) were retrieved from the Internet through AltaVista Search Engine. The abstracts of these Web pages were then generated by hand by simulating the new algorithm. Master of Science (Information Studies) 2008-09-10T08:34:53Z 2008-09-10T08:34:53Z 2000 2000 Thesis http://hdl.handle.net/10356/1632 Nanyang Technological University application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
topic DRNTU::Library and information science::Libraries::Automation
DRNTU::Library and information science::Libraries::Technologies
spellingShingle DRNTU::Library and information science::Libraries::Automation
DRNTU::Library and information science::Libraries::Technologies
New, Chin Ker.
Query-based text extraction algorithm for web pages.
description The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using the AltaVista Search Engine. The development of this algorithm was based on sentence weight (through simple calculation), cue words, location of the sentence and the application of canned abstracts. To test out the new algorithm, a total of 50 Web pages (from 10 different subject areas) were retrieved from the Internet through AltaVista Search Engine. The abstracts of these Web pages were then generated by hand by simulating the new algorithm.
author2 Khoo, Christopher Soo Guan
author_facet Khoo, Christopher Soo Guan
New, Chin Ker.
format Theses and Dissertations
author New, Chin Ker.
author_sort New, Chin Ker.
title Query-based text extraction algorithm for web pages.
title_short Query-based text extraction algorithm for web pages.
title_full Query-based text extraction algorithm for web pages.
title_fullStr Query-based text extraction algorithm for web pages.
title_full_unstemmed Query-based text extraction algorithm for web pages.
title_sort query-based text extraction algorithm for web pages.
publishDate 2008
url http://hdl.handle.net/10356/1632
_version_ 1681049025995866112