The Metric Space of Proteins: Comparative Study of Clustering Algorithms

A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually...

Full description

Saved in:
Bibliographic Details
Main Authors: SASSON, Ori, Linial, Nathan, Linial, Michal
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2002
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1249
http://bioinformatics.oxfordjournals.org/content/18/suppl_1/S14.abstract
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2248
record_format dspace
spelling sg-smu-ink.sis_research-22482013-07-23T06:20:00Z The Metric Space of Proteins: Comparative Study of Clustering Algorithms SASSON, Ori Linial, Nathan Linial, Michal A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation. We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation. 2002-04-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/1249 info:doi/10.1093/bioinformatics/18.suppl_1.S14 http://bioinformatics.oxfordjournals.org/content/18/suppl_1/S14.abstract Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University protein families protein classification sequence alignment clustering Bioinformatics Computer Sciences
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic protein families
protein classification
sequence alignment
clustering
Bioinformatics
Computer Sciences
spellingShingle protein families
protein classification
sequence alignment
clustering
Bioinformatics
Computer Sciences
SASSON, Ori
Linial, Nathan
Linial, Michal
The Metric Space of Proteins: Comparative Study of Clustering Algorithms
description A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation. We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.
format text
author SASSON, Ori
Linial, Nathan
Linial, Michal
author_facet SASSON, Ori
Linial, Nathan
Linial, Michal
author_sort SASSON, Ori
title The Metric Space of Proteins: Comparative Study of Clustering Algorithms
title_short The Metric Space of Proteins: Comparative Study of Clustering Algorithms
title_full The Metric Space of Proteins: Comparative Study of Clustering Algorithms
title_fullStr The Metric Space of Proteins: Comparative Study of Clustering Algorithms
title_full_unstemmed The Metric Space of Proteins: Comparative Study of Clustering Algorithms
title_sort metric space of proteins: comparative study of clustering algorithms
publisher Institutional Knowledge at Singapore Management University
publishDate 2002
url https://ink.library.smu.edu.sg/sis_research/1249
http://bioinformatics.oxfordjournals.org/content/18/suppl_1/S14.abstract
_version_ 1770570928426582016