The Metric Space of Proteins: Comparative Study of Clustering Algorithms
A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2002
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/1249 http://bioinformatics.oxfordjournals.org/content/18/suppl_1/S14.abstract |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-2248 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-22482013-07-23T06:20:00Z The Metric Space of Proteins: Comparative Study of Clustering Algorithms SASSON, Ori Linial, Nathan Linial, Michal A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation. We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation. 2002-04-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/1249 info:doi/10.1093/bioinformatics/18.suppl_1.S14 http://bioinformatics.oxfordjournals.org/content/18/suppl_1/S14.abstract Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University protein families protein classification sequence alignment clustering Bioinformatics Computer Sciences |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
protein families protein classification sequence alignment clustering Bioinformatics Computer Sciences |
spellingShingle |
protein families protein classification sequence alignment clustering Bioinformatics Computer Sciences SASSON, Ori Linial, Nathan Linial, Michal The Metric Space of Proteins: Comparative Study of Clustering Algorithms |
description |
A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation. We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation. |
format |
text |
author |
SASSON, Ori Linial, Nathan Linial, Michal |
author_facet |
SASSON, Ori Linial, Nathan Linial, Michal |
author_sort |
SASSON, Ori |
title |
The Metric Space of Proteins: Comparative Study of Clustering Algorithms |
title_short |
The Metric Space of Proteins: Comparative Study of Clustering Algorithms |
title_full |
The Metric Space of Proteins: Comparative Study of Clustering Algorithms |
title_fullStr |
The Metric Space of Proteins: Comparative Study of Clustering Algorithms |
title_full_unstemmed |
The Metric Space of Proteins: Comparative Study of Clustering Algorithms |
title_sort |
metric space of proteins: comparative study of clustering algorithms |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2002 |
url |
https://ink.library.smu.edu.sg/sis_research/1249 http://bioinformatics.oxfordjournals.org/content/18/suppl_1/S14.abstract |
_version_ |
1770570928426582016 |