Detection of outlier residues for improving interface prediction in protein heterocomplexes

Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Chen, Peng, Wong, Limsoon, Li, Jinyan
其他作者:	School of Computer Engineering
格式:	Article
語言:	English
出版:	2013
主題:	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
在線閱讀:	https://hdl.handle.net/10356/103829 http://hdl.handle.net/10220/16551
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-103829
record_format	dspace
spelling	sg-ntu-dr.10356-1038292020-05-28T07:17:30Z Detection of outlier residues for improving interface prediction in protein heterocomplexes Chen, Peng Wong, Limsoon Li, Jinyan School of Computer Engineering Bioinformatics Research Centre DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions. 2013-10-17T04:31:08Z 2019-12-06T21:21:13Z 2013-10-17T04:31:08Z 2019-12-06T21:21:13Z 2012 2012 Journal Article Chen, P., Wong, L. S., & Li, J. Y. (2012). Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM transactions on computational biology and bioinformatics, 9(4), 1155-1165. 1545-5963 https://hdl.handle.net/10356/103829 http://hdl.handle.net/10220/16551 10.1109/TCBB.2012.58 en IEEE/ACM transactions on computational biology and bioinformatics © 2012 IEEE
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences Chen, Peng Wong, Limsoon Li, Jinyan Detection of outlier residues for improving interface prediction in protein heterocomplexes
description	Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Chen, Peng Wong, Limsoon Li, Jinyan
format	Article
author	Chen, Peng Wong, Limsoon Li, Jinyan
author_sort	Chen, Peng
title	Detection of outlier residues for improving interface prediction in protein heterocomplexes
title_short	Detection of outlier residues for improving interface prediction in protein heterocomplexes
title_full	Detection of outlier residues for improving interface prediction in protein heterocomplexes
title_fullStr	Detection of outlier residues for improving interface prediction in protein heterocomplexes
title_full_unstemmed	Detection of outlier residues for improving interface prediction in protein heterocomplexes
title_sort	detection of outlier residues for improving interface prediction in protein heterocomplexes
publishDate	2013
url	https://hdl.handle.net/10356/103829 http://hdl.handle.net/10220/16551
_version_	1681058540647612416

Detection of outlier residues for improving interface prediction in protein heterocomplexes

相似書籍