An improving method for estimating amino acid replacement models

Amino acid replacement models (amino acid substitution models or ma-trices) play important roles in protein phylogenetics analysis and protein sequence alignment. Dayhoff was the fi rst person who proposed a method to build amino acid models in 1972. Currently, maximum likelihood (ML) methods ar...

Full description

Saved in:
Bibliographic Details
Main Author: Lê, Văn Đạt
Format: Theses and Dissertations
Language:other
Published: Đại học Quốc gia Hà Nội 2016
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/8266
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: other
Description
Summary:Amino acid replacement models (amino acid substitution models or ma-trices) play important roles in protein phylogenetics analysis and protein sequence alignment. Dayhoff was the fi rst person who proposed a method to build amino acid models in 1972. Currently, maximum likelihood (ML) methods are widely used to estimate popular models such as WAG, LG, FLU, etc. However, ML methods are slow and not applicable to large datasets. The most time consuming step in estimating matrices is build-ingphylogenetics trees from protein alignments. In this thesis, we propose new methods to overcome the obstacle by splitting large alignments into small ones which still contain enough evolutionary information for esti-mating matrices. Experiments with both Pfam and FLU data sets show that proposed meth-ods are about three to nine times faster than the best current method while the quality of estimated matrices are nearly the same. Thus, our methods will enable researchers to estimate matrices from very large datasets.