A strategy for extracting information from semi-structured web pages.
Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
2010
|
Online Access: | http://psasir.upm.edu.my/id/eprint/12868/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English |
id |
my.upm.eprints.12868 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.128682012-01-27T01:25:59Z http://psasir.upm.edu.my/id/eprint/12868/ A strategy for extracting information from semi-structured web pages. Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information. 2010 Article PeerReviewed Shaker, Mahmoud and Ibrahim, Hamidah and Mustapha, Aida and Abdullah, Lili Nurliyana (2010) A strategy for extracting information from semi-structured web pages. International Journal of Web Information Systems , 6 (4). pp. 304-318. ISSN 1744-0084 10.1108/17440081011090239 English |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
Purpose – The aim of this paper is to propose a strategy for extracting information from web tables.
Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines.
Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes.
Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent.
Originality/value – This paper contributes to the research on extracting information. |
format |
Article |
author |
Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana |
spellingShingle |
Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana A strategy for extracting information from semi-structured web pages. |
author_facet |
Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana |
author_sort |
Shaker, Mahmoud |
title |
A strategy for extracting information from semi-structured web pages. |
title_short |
A strategy for extracting information from semi-structured web pages. |
title_full |
A strategy for extracting information from semi-structured web pages. |
title_fullStr |
A strategy for extracting information from semi-structured web pages. |
title_full_unstemmed |
A strategy for extracting information from semi-structured web pages. |
title_sort |
strategy for extracting information from semi-structured web pages. |
publishDate |
2010 |
url |
http://psasir.upm.edu.my/id/eprint/12868/ |
_version_ |
1643825158846152704 |