Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction

Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues...

Full description

Saved in:
Bibliographic Details
Main Author: Kasim, Shahreen
Format: Thesis
Language:English
Published: 2011
Subjects:
Online Access:http://eprints.utm.my/id/eprint/32110/5/ShahreenKasimPFSKSM2011.pdf
http://eprints.utm.my/id/eprint/32110/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
id my.utm.32110
record_format eprints
spelling my.utm.321102018-05-27T07:11:11Z http://eprints.utm.my/id/eprint/32110/ Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction Kasim, Shahreen QA75 Electronic computers. Computer science Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues that need to be addressed in gene function prediction, namely: solving multiple fuzzy clusters using biological knowledge and biological annotations in some existing databases. This includes, handling the high level expression and low level expression values. Therefore, this research was aimed at clustering gene expressions by incorporating biological knowledge in order to handle these issues. The basic Fuzzy c-Means (FCM) algorithm was introduced to address multiple fuzzy clusters in gene expression. Clustering Functional Annotation (CluFA) was developed to deal with insufficient knowledge via incorporating Gene Ontology (GO) datasets and multiple functional annotation databases. The GO datasets were used to determine number of clusters as well as clusters for genes. Meanwhile, the evidence codes in functional annotation databases were used to compute the strength of the association between data element and a particular cluster. The multi stage filtering-CluFA (msf-CluFA) was implemented by conducting filtering stages and applying an enhanced apriori algorithm in order to handle the high level expression and low level expression values. The performance of the proposed method was evaluated in terms of compactness and separation, consistency, and accuracy, using Eisen and Gasch datasets. Biological validation was also used to validate the gene function prediction, by cross checking them with the most recent annotation database. The results show that the proposed computational method achieved better results compared with other methods such as GOFuzzy, FuzzyK, and FuzzySOM in predicting unknown gene function. 2011-11 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/32110/5/ShahreenKasimPFSKSM2011.pdf Kasim, Shahreen (2011) Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction. PhD thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Kasim, Shahreen
Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
description Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues that need to be addressed in gene function prediction, namely: solving multiple fuzzy clusters using biological knowledge and biological annotations in some existing databases. This includes, handling the high level expression and low level expression values. Therefore, this research was aimed at clustering gene expressions by incorporating biological knowledge in order to handle these issues. The basic Fuzzy c-Means (FCM) algorithm was introduced to address multiple fuzzy clusters in gene expression. Clustering Functional Annotation (CluFA) was developed to deal with insufficient knowledge via incorporating Gene Ontology (GO) datasets and multiple functional annotation databases. The GO datasets were used to determine number of clusters as well as clusters for genes. Meanwhile, the evidence codes in functional annotation databases were used to compute the strength of the association between data element and a particular cluster. The multi stage filtering-CluFA (msf-CluFA) was implemented by conducting filtering stages and applying an enhanced apriori algorithm in order to handle the high level expression and low level expression values. The performance of the proposed method was evaluated in terms of compactness and separation, consistency, and accuracy, using Eisen and Gasch datasets. Biological validation was also used to validate the gene function prediction, by cross checking them with the most recent annotation database. The results show that the proposed computational method achieved better results compared with other methods such as GOFuzzy, FuzzyK, and FuzzySOM in predicting unknown gene function.
format Thesis
author Kasim, Shahreen
author_facet Kasim, Shahreen
author_sort Kasim, Shahreen
title Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_short Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_full Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_fullStr Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_full_unstemmed Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_sort fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
publishDate 2011
url http://eprints.utm.my/id/eprint/32110/5/ShahreenKasimPFSKSM2011.pdf
http://eprints.utm.my/id/eprint/32110/
_version_ 1643648943761915904