Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology

In this paper, protein function prediction is considered as a complex hierarchical multi-label classification problem. Each instance can be classified into several classes and these are organized in a hierarchical structure where each class has a parent-child relationship with one another. eHMAC is...

Full description

Saved in:
Bibliographic Details
Main Authors: Sawinee Sangsuriyun, Thanawin Rakthanmanon, Kitsana Waiyamai
Language:English
Published: Science Faculty of Chiang Mai University 2019
Subjects:
Online Access:http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=9788
http://cmuir.cmu.ac.th/jspui/handle/6653943832/66000
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
id th-cmuir.6653943832-66000
record_format dspace
spelling th-cmuir.6653943832-660002019-08-21T09:18:19Z Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology Sawinee Sangsuriyun Thanawin Rakthanmanon Kitsana Waiyamai protein function prediction associative classification hierarchical classification multi-label classification negative rules In this paper, protein function prediction is considered as a complex hierarchical multi-label classification problem. Each instance can be classified into several classes and these are organized in a hierarchical structure where each class has a parent-child relationship with one another. eHMAC is an extended Hierarchical Multi-label Associative Classification that has been proposed for automated protein function prediction. Main objective of this paper is to improve both accuracy and explanation abilities of Hierarchical Multi-label Associative Classification (HMAC) in predicting functions of new protein sequences. The idea is to utilize the gene ontology as background knowledge and integrate it into different steps of HMAC. Three domains of gene ontology which are molecular function, biological process, and cellular component are used as background knowledge to generate high-quality classification rules to predicted protein functions. The experimental results showed that the eHMAC method using background knowledge provided significantly better results than the previously proposed HMAC. Not only the prediction accuracy was greatly improved, but also the explanation abilities of the function prediction model in terms of association between motifs and Gene Ontology (GO) terms. 2019-08-21T09:18:19Z 2019-08-21T09:18:19Z 2019 Chiang Mai Journal of Science 46, 1 (Jan 2019), 165 - 179 0125-2526 http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=9788 http://cmuir.cmu.ac.th/jspui/handle/6653943832/66000 Eng Science Faculty of Chiang Mai University
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
language English
topic protein function prediction
associative classification
hierarchical classification
multi-label classification
negative rules
spellingShingle protein function prediction
associative classification
hierarchical classification
multi-label classification
negative rules
Sawinee Sangsuriyun
Thanawin Rakthanmanon
Kitsana Waiyamai
Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
description In this paper, protein function prediction is considered as a complex hierarchical multi-label classification problem. Each instance can be classified into several classes and these are organized in a hierarchical structure where each class has a parent-child relationship with one another. eHMAC is an extended Hierarchical Multi-label Associative Classification that has been proposed for automated protein function prediction. Main objective of this paper is to improve both accuracy and explanation abilities of Hierarchical Multi-label Associative Classification (HMAC) in predicting functions of new protein sequences. The idea is to utilize the gene ontology as background knowledge and integrate it into different steps of HMAC. Three domains of gene ontology which are molecular function, biological process, and cellular component are used as background knowledge to generate high-quality classification rules to predicted protein functions. The experimental results showed that the eHMAC method using background knowledge provided significantly better results than the previously proposed HMAC. Not only the prediction accuracy was greatly improved, but also the explanation abilities of the function prediction model in terms of association between motifs and Gene Ontology (GO) terms.
author Sawinee Sangsuriyun
Thanawin Rakthanmanon
Kitsana Waiyamai
author_facet Sawinee Sangsuriyun
Thanawin Rakthanmanon
Kitsana Waiyamai
author_sort Sawinee Sangsuriyun
title Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
title_short Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
title_full Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
title_fullStr Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
title_full_unstemmed Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
title_sort hierarchical multi-label associative classification for protein function prediction using gene ontology
publisher Science Faculty of Chiang Mai University
publishDate 2019
url http://it.science.cmu.ac.th/ejournal/dl.php?journal_id=9788
http://cmuir.cmu.ac.th/jspui/handle/6653943832/66000
_version_ 1681426373676105728