Evolutionary approaches for predicting MHC peptide binding
T cells of the immune system recognize short linear peptides, known as epitopes, derived from degradation of protein antigens and presented by major histocompatibility complex (MHC) molecules on the surface of antigen processing cells. Antigen presentations in MHC class I and II pathways are critica...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/14955 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | T cells of the immune system recognize short linear peptides, known as epitopes, derived from degradation of protein antigens and presented by major histocompatibility complex (MHC) molecules on the surface of antigen processing cells. Antigen presentations in MHC class I and II pathways are critical for the initiation and regulation of immune responses. Recognition of T cell epitopes is a key regulatory step in control of infectious and autoimmunity diseases, and tumors. Accurate computational predictions can dramatically speed up the identification of potential T cell epitopes and thereby minimize the number of wet lab experiments needed for investigation of molecular basis of immunity. Prediction of binding peptides to MHC class II is more difficult than to class I molecules because of their ability to bind to peptides of different lengths. Having variable length binding peptides in a dataset restricts the utilization of classifiers because the number of input features to the classifier is known to be constant. In this thesis, we address the problem of predicting peptide binding to MHC class II molecules. The varying lengths of binding peptides have become a major bottleneck to earlier motif detection techniques. We define the problem as a multi-objective optimization problem, which in addition to accommodating variable length peptides, allows incorporation of a priori knowledge of motifs and binding residues. Often times, with binding peptides, information of binding motifs (acquired through experimental means or through prediction models) are also available. In order to accommodate such a priori information, we introduce the problem of {\em{guided-discovery of motifs}} in this thesis. This thesis proposes two techniques to guide the discovery of motifs, using consensus sequences and using anchor residues. Self-discovery of motifs uses only the information of binders and non-binders. Self-discovery of motifs is a special case of guided-discovery which uses only the information from binders and non-binders. The proposed approaches are implemented by using an multi-objective evolutionary algorithm (MOEA) and a genetic annealing algorithm (GAA). The motifs derived from proposed MOEA and GAA approaches demonstrate excellent predictive performances for number of benchmark datasets. These methods also facilitate identifying a motif for a dataset that had difficulty in finding a consensus motif. In addition, MOEA and GAA derived motifs show superior generalization capabilities to those derived with number of computational techniques as well as to the experimentally determined motifs on other datasets. We also study the effects of using positive samples alone in building predictive models. |
---|