Query-based text extraction algorithm for web pages.
The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using t...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Published: |
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/1632 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
id |
sg-ntu-dr.10356-1632 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-16322019-12-10T14:36:59Z Query-based text extraction algorithm for web pages. New, Chin Ker. Khoo, Christopher Soo Guan Wee Kim Wee School of Communication and Information DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using the AltaVista Search Engine. The development of this algorithm was based on sentence weight (through simple calculation), cue words, location of the sentence and the application of canned abstracts. To test out the new algorithm, a total of 50 Web pages (from 10 different subject areas) were retrieved from the Internet through AltaVista Search Engine. The abstracts of these Web pages were then generated by hand by simulating the new algorithm. Master of Science (Information Studies) 2008-09-10T08:34:53Z 2008-09-10T08:34:53Z 2000 2000 Thesis http://hdl.handle.net/10356/1632 Nanyang Technological University application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
topic |
DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies |
spellingShingle |
DRNTU::Library and information science::Libraries::Automation DRNTU::Library and information science::Libraries::Technologies New, Chin Ker. Query-based text extraction algorithm for web pages. |
description |
The objective of this research is to develop a query-based text extraction algorithm to generate an abstract from a Web document automatically. The algorithm was derived after a study of a sample of 60 sample Web pages. These Web pages were chosen from 5 different subject areas and retrieved using the AltaVista Search Engine. The development of this algorithm was based on sentence weight (through simple calculation), cue words, location of the sentence and the application of canned abstracts. To test out the new algorithm, a total of 50 Web pages (from 10 different subject areas) were retrieved from the Internet through AltaVista Search Engine. The abstracts of these Web pages were then generated by hand by simulating the new algorithm. |
author2 |
Khoo, Christopher Soo Guan |
author_facet |
Khoo, Christopher Soo Guan New, Chin Ker. |
format |
Theses and Dissertations |
author |
New, Chin Ker. |
author_sort |
New, Chin Ker. |
title |
Query-based text extraction algorithm for web pages. |
title_short |
Query-based text extraction algorithm for web pages. |
title_full |
Query-based text extraction algorithm for web pages. |
title_fullStr |
Query-based text extraction algorithm for web pages. |
title_full_unstemmed |
Query-based text extraction algorithm for web pages. |
title_sort |
query-based text extraction algorithm for web pages. |
publishDate |
2008 |
url |
http://hdl.handle.net/10356/1632 |
_version_ |
1681049025995866112 |