Performance of Rule Identification from Web Pages
In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and ge...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2004
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/1160 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-2159 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-21592010-12-22T08:24:06Z Performance of Rule Identification from Web Pages KANG, Juyoung LEE, Jae Kyu In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Markup Language (RSML), both parts of XRML. RIML is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML, we considered the features of sharing variables and values, omitted terms, and synonyms. Using these features, rules can be identified or changed once, automatically generating their corresponding RSML rules. We have conducted an experiment to evaluate the effect of the RIML approach with real-world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 97.7 percent of the rules can be detected on the Web pages, and the completeness of generated rule components is 88.5 percent. This is good proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment. 2004-01-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/1160 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Rule identification rule acquisition knowledge engineering knowledge acquisition XRML RuleML XML Computer Sciences Management Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Rule identification rule acquisition knowledge engineering knowledge acquisition XRML RuleML XML Computer Sciences Management Information Systems |
spellingShingle |
Rule identification rule acquisition knowledge engineering knowledge acquisition XRML RuleML XML Computer Sciences Management Information Systems KANG, Juyoung LEE, Jae Kyu Performance of Rule Identification from Web Pages |
description |
In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Markup Language (RSML), both parts of XRML. RIML is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML, we considered the features of sharing variables and values, omitted terms, and synonyms. Using these features, rules can be identified or changed once, automatically generating their corresponding RSML rules. We have conducted an experiment to evaluate the effect of the RIML approach with real-world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 97.7 percent of the rules can be detected on the Web pages, and the completeness of generated rule components is 88.5 percent. This is good proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment. |
format |
text |
author |
KANG, Juyoung LEE, Jae Kyu |
author_facet |
KANG, Juyoung LEE, Jae Kyu |
author_sort |
KANG, Juyoung |
title |
Performance of Rule Identification from Web Pages |
title_short |
Performance of Rule Identification from Web Pages |
title_full |
Performance of Rule Identification from Web Pages |
title_fullStr |
Performance of Rule Identification from Web Pages |
title_full_unstemmed |
Performance of Rule Identification from Web Pages |
title_sort |
performance of rule identification from web pages |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2004 |
url |
https://ink.library.smu.edu.sg/sis_research/1160 |
_version_ |
1770570882427650048 |