Prediction of disease-disease associations based on relation extraction from biomedical journals using support vector machines

Predicting novel associations between biomedical entities, such as genes, drugs and diseases, can suggest new topics for experiments and new insights in drug design. Due to the massive amounts of relevant data available, a computational approach is well-suited for this task. Initial data can be take...

Full description

Saved in:
Bibliographic Details
Main Author: Laron, Andrew V.
Format: text
Language:English
Published: Animo Repository 2017
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/5765
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Predicting novel associations between biomedical entities, such as genes, drugs and diseases, can suggest new topics for experiments and new insights in drug design. Due to the massive amounts of relevant data available, a computational approach is well-suited for this task. Initial data can be taken either from curated databases of biomedical terms and the relations between them, or directly from the text of research articles. Existing studies on predicting associations between diseases based on published articles generally use a co-occurrence-based approach, such as extracting the names of diseases and other entities from articles. The weighting scheme for such an approach is based on how many times entity pairs occur together in different documents. This paper describes a semantic analysis- based approach. It extracts biological events and relations between biochemical entities and diseases from texts, and only identifes general associations between entities if instances of relation between them were extracted. The system had an overall accuracy of 84.35% when tested with ve-fold cross-validation on 86 articles from PubMed Central Open Access. The effectiveness of several instance features on improving relation extraction was tested, and a 1-token-window bag of words around tokens indicating biomedical entities was found to improve accuracy, while entity distance, token distance, and syntactic dependency subtree had little effect on accuracy.