Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology

In this paper, protein function prediction is considered as a complex hierarchical multi-label classification problem. Each instance can be classified into several classes and these are organized in a hierarchical structure where each class has a parent-child relationship with one another. eHMAC is...

Full description

Saved in:
Bibliographic Details
Main Authors: Sawinee Sangsuriyun, Thanawin Rakthanmanon, Kitsana Waiyamai
Language:English
Published: Science Faculty of Chiang Mai University 2019
Subjects:
Online Access:http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=9788
http://cmuir.cmu.ac.th/jspui/handle/6653943832/66000
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
Description
Summary:In this paper, protein function prediction is considered as a complex hierarchical multi-label classification problem. Each instance can be classified into several classes and these are organized in a hierarchical structure where each class has a parent-child relationship with one another. eHMAC is an extended Hierarchical Multi-label Associative Classification that has been proposed for automated protein function prediction. Main objective of this paper is to improve both accuracy and explanation abilities of Hierarchical Multi-label Associative Classification (HMAC) in predicting functions of new protein sequences. The idea is to utilize the gene ontology as background knowledge and integrate it into different steps of HMAC. Three domains of gene ontology which are molecular function, biological process, and cellular component are used as background knowledge to generate high-quality classification rules to predicted protein functions. The experimental results showed that the eHMAC method using background knowledge provided significantly better results than the previously proposed HMAC. Not only the prediction accuracy was greatly improved, but also the explanation abilities of the function prediction model in terms of association between motifs and Gene Ontology (GO) terms.