A strategy for extracting information from semi-structured web pages.

Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs...

Full description

Saved in:
Bibliographic Details
Main Authors: Shaker, Mahmoud, Ibrahim, Hamidah, Mustapha, Aida, Abdullah, Lili Nurliyana
Format: Article
Language:English
Published: 2010
Online Access:http://psasir.upm.edu.my/id/eprint/12868/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.12868
record_format eprints
spelling my.upm.eprints.128682012-01-27T01:25:59Z http://psasir.upm.edu.my/id/eprint/12868/ A strategy for extracting information from semi-structured web pages. Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information. 2010 Article PeerReviewed Shaker, Mahmoud and Ibrahim, Hamidah and Mustapha, Aida and Abdullah, Lili Nurliyana (2010) A strategy for extracting information from semi-structured web pages. International Journal of Web Information Systems , 6 (4). pp. 304-318. ISSN 1744-0084 10.1108/17440081011090239 English
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information.
format Article
author Shaker, Mahmoud
Ibrahim, Hamidah
Mustapha, Aida
Abdullah, Lili Nurliyana
spellingShingle Shaker, Mahmoud
Ibrahim, Hamidah
Mustapha, Aida
Abdullah, Lili Nurliyana
A strategy for extracting information from semi-structured web pages.
author_facet Shaker, Mahmoud
Ibrahim, Hamidah
Mustapha, Aida
Abdullah, Lili Nurliyana
author_sort Shaker, Mahmoud
title A strategy for extracting information from semi-structured web pages.
title_short A strategy for extracting information from semi-structured web pages.
title_full A strategy for extracting information from semi-structured web pages.
title_fullStr A strategy for extracting information from semi-structured web pages.
title_full_unstemmed A strategy for extracting information from semi-structured web pages.
title_sort strategy for extracting information from semi-structured web pages.
publishDate 2010
url http://psasir.upm.edu.my/id/eprint/12868/
_version_ 1643825158846152704