Semantic querying over knowledge in biomedical text corpora annotated with multiple ontologies

Existing ontology-based knowledge representations systems have achieved considerable success in semantic querying on large biomedical text corpora over keyword-based systems. However, their query expressivity is limited due to the lack of cross-ontology integration and semantic relations. We present...

Full description

Saved in:
Bibliographic Details
Main Authors: Chua, Watson Wei Khong, Kim, Jung-jae
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/96180
http://hdl.handle.net/10220/11920
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Existing ontology-based knowledge representations systems have achieved considerable success in semantic querying on large biomedical text corpora over keyword-based systems. However, their query expressivity is limited due to the lack of cross-ontology integration and semantic relations. We present a System for Multiple-Ontology Knowledge Representation (SMOKR) to alleviate the problem. The system first performs annotations of phrases and the semantic relations between them using different domain ontologies, before instantiating the ontologies with the annotated phrases. It then integrates the ontologies by matching their instances using simple NLP techniques, and also by matching their concepts using the state-of-the-art Biomedical Ontology Alignment Tool (BOAT). SMOKR performs inconsistency detection to remove conflicting axioms in order to create a consistent ontology for querying. We evaluate the performance of the system by testing it with a set of semantic queries, and the results are compared to a keyword-based search engine, Lucene, and a hybrid system, SSOKR_Luc, which combines a knowledge representation system using a single ontology and the keyword-based search engine, Lucene. SMOKR shows the best performance of F-Measures 0.7 and 0.87 on the GRO Corpus and the GENIA Corpus, respectively, compared to that of SSOKR_Luc at 0.62 and 0.33, and that of Lucene at 0.36 and 0.12.