BOPA : a bayesian hierarchical model for outlier expression detection
DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of gene expressions in cells. A common task with microarrays is to determine which genes are differentially expressed under two different biological conditions of interest (e.g. cancerous against n...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/49053 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of gene expressions in cells. A common task with microarrays is to determine which genes are differentially expressed under two different biological conditions of interest (e.g. cancerous against non-cancerous cells). It is often the case that there are thousands of genes for a single individual but relatively few individuals in the data set. Additionally, in many cancer studies, a gene may be expressed in some but not all
of the disease samples, reflecting the complexity of the underlying disease. Traditional t-tests assume a mean shift for the tumor samples compared to normal samples and is thus not structured to capture partial differential expression. More powerful tests specially designed for this situation are needed to find genes with heterogeneous expressions associated with possible subtypes of the cancer. This thesis proposes a Bayesian model for cancer outlier profile analysis (BOPA). We build on the Gamma-Gamma model introduced in Newton et al. (2001); Kendziorski et al. (2003) and Newton et al. (2004), by using a five-component mixture model to represent various
differential expression patterns. The hierarchical mixture model
explicitly accounts for outlier expressions and inferences are based on samples from posterior distributions generated from a Markov chain Monte Carlo algorithm. We present simulation
and real-life datasets analysis to demonstrate our proposed
methodology. |
---|