On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properti...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wong, Wing-Cheong, Maurer-Stroh, Sebastian, Eisenhaber, Birgit, Eisenhaber, Frank
Other Authors:	School of Computer Engineering
Format:	Article
Language:	English
Published:	2014
Subjects:	DRNTU::Science::Biological sciences
Online Access:	https://hdl.handle.net/10356/103901 http://hdl.handle.net/10220/20043
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-103901
record_format	dspace
spelling	sg-ntu-dr.10356-1039012023-02-28T17:04:56Z On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation Wong, Wing-Cheong Maurer-Stroh, Sebastian Eisenhaber, Birgit Eisenhaber, Frank School of Computer Engineering School of Biological Sciences DRNTU::Science::Biological sciences Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. Published version 2014-07-03T04:06:49Z 2019-12-06T21:22:41Z 2014-07-03T04:06:49Z 2019-12-06T21:22:41Z 2014 2014 Journal Article Wong, W.-C., Maurer-Stroh, S., Eisenhaber, B., & Eisenhaber, F. (2014). On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation. BMC Bioinformatics, 15(1), 166-. 1471-2105 https://hdl.handle.net/10356/103901 http://hdl.handle.net/10220/20043 10.1186/1471-2105-15-166 24890864 en BMC bioinformatics © 2014 Wong et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Science::Biological sciences
spellingShingle	DRNTU::Science::Biological sciences Wong, Wing-Cheong Maurer-Stroh, Sebastian Eisenhaber, Birgit Eisenhaber, Frank On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
description	Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Wong, Wing-Cheong Maurer-Stroh, Sebastian Eisenhaber, Birgit Eisenhaber, Frank
format	Article
author	Wong, Wing-Cheong Maurer-Stroh, Sebastian Eisenhaber, Birgit Eisenhaber, Frank
author_sort	Wong, Wing-Cheong
title	On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_short	On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_full	On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_fullStr	On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_full_unstemmed	On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
title_sort	on the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
publishDate	2014
url	https://hdl.handle.net/10356/103901 http://hdl.handle.net/10220/20043
_version_	1759852906375282688

On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

Similar Items