Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
Remote protein homology detection refers to the detection of structural homology in weak proteins. Remote protein homology is important to identify function for new proteins which could assist in curing genetic diseases, performing drug design, and identifying novel enzymes. To detect remote protein...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2010
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/16677/7/SurayatiIsmailMFSKSM2010.pdf http://eprints.utm.my/id/eprint/16677/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
Language: | English |
id |
my.utm.16677 |
---|---|
record_format |
eprints |
spelling |
my.utm.166772017-09-17T08:13:19Z http://eprints.utm.my/id/eprint/16677/ Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology Ismail, Surayati QA75 Electronic computers. Computer science Remote protein homology detection refers to the detection of structural homology in weak proteins. Remote protein homology is important to identify function for new proteins which could assist in curing genetic diseases, performing drug design, and identifying novel enzymes. To detect remote protein homology, several problems have been identified by researchers which are hard-to-align proteins homology detection and high dimensional feature vectors of proteins caused by redundant and noisy data. To address these problems, a new remote protein homology detection computational framework has been developed. The computational framework begins by extracting structural similarity of protein using highly sensitive structural similarity algorithm which consist of four steps: split protein sequences into substring, calculate similarity using pairwise protein substring alignment, build guide tree, and extract the high structural similarity using multiple protein sequence alignment. Then, Latent Semantic Analysis algorithm (LSA) is used to produce feature vectors. The LSA consist of three steps: generate protein pattern blocks using TEIRESIAS algorithm, remove redundant data using chi-square algorithm, and noisy data using Singular Value Decomposition (SVD) algorithm. Lastly, this computational framework uses SVM to classify all the proteins into homologue or non-homologue members. The proposed computational framework is analyzed using dataset from SCOP database version 1.53 and the performance has been compared with other methods such as PSI-BLAST and SVM-Pairwise sequence comparison models, SAM and HMMER generative models, and SVM-Fisher and SVM-I-Sites discriminative classifier models in terms of Receiver Operating Characteristic (ROC), Median Rate of False Positives (MRFP), and family by family comparison of ROC. The results show that the proposed computational framework successfully outperforms other remote protein homology detection methods. 2010 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/16677/7/SurayatiIsmailMFSKSM2010.pdf Ismail, Surayati (2010) Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System. |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Ismail, Surayati Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
description |
Remote protein homology detection refers to the detection of structural homology in weak proteins. Remote protein homology is important to identify function for new proteins which could assist in curing genetic diseases, performing drug design, and identifying novel enzymes. To detect remote protein homology, several problems have been identified by researchers which are hard-to-align proteins homology detection and high dimensional feature vectors of proteins caused by redundant and noisy data. To address these problems, a new remote protein homology detection computational framework has been developed. The computational framework begins by extracting structural similarity of protein using highly sensitive structural similarity algorithm which consist of four steps: split protein sequences into substring, calculate similarity using pairwise protein substring alignment, build guide tree, and extract the high structural similarity using multiple protein sequence alignment. Then, Latent Semantic Analysis algorithm (LSA) is used to produce feature vectors. The LSA consist of three steps: generate protein pattern blocks using TEIRESIAS algorithm, remove redundant data using chi-square algorithm, and noisy data using Singular Value Decomposition (SVD) algorithm. Lastly, this computational framework uses SVM to classify all the proteins into homologue or non-homologue members. The proposed computational framework is analyzed using dataset from SCOP database version 1.53 and the performance has been compared with other methods such as PSI-BLAST and SVM-Pairwise sequence comparison models, SAM and HMMER generative models, and SVM-Fisher and SVM-I-Sites discriminative classifier models in terms of Receiver Operating Characteristic (ROC), Median Rate of False Positives (MRFP), and family by family comparison of ROC. The results show that the proposed computational framework successfully outperforms other remote protein homology detection methods. |
format |
Thesis |
author |
Ismail, Surayati |
author_facet |
Ismail, Surayati |
author_sort |
Ismail, Surayati |
title |
Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
title_short |
Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
title_full |
Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
title_fullStr |
Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
title_full_unstemmed |
Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
title_sort |
sequence comparison latent semantic analysis and support vector machine to detect remote protein homology |
publishDate |
2010 |
url |
http://eprints.utm.my/id/eprint/16677/7/SurayatiIsmailMFSKSM2010.pdf http://eprints.utm.my/id/eprint/16677/ |
_version_ |
1643646629354405888 |