BOPA : a bayesian hierarchical model for outlier expression detection

DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of gene expressions in cells. A common task with microarrays is to determine which genes are differentially expressed under two different biological conditions of interest (e.g. cancerous against n...

Full description

Saved in:
Bibliographic Details
Main Author: Hong, Zhaoping
Other Authors: Lian Heng
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:https://hdl.handle.net/10356/49053
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of gene expressions in cells. A common task with microarrays is to determine which genes are differentially expressed under two different biological conditions of interest (e.g. cancerous against non-cancerous cells). It is often the case that there are thousands of genes for a single individual but relatively few individuals in the data set. Additionally, in many cancer studies, a gene may be expressed in some but not all of the disease samples, reflecting the complexity of the underlying disease. Traditional t-tests assume a mean shift for the tumor samples compared to normal samples and is thus not structured to capture partial differential expression. More powerful tests specially designed for this situation are needed to find genes with heterogeneous expressions associated with possible subtypes of the cancer. This thesis proposes a Bayesian model for cancer outlier profile analysis (BOPA). We build on the Gamma-Gamma model introduced in Newton et al. (2001); Kendziorski et al. (2003) and Newton et al. (2004), by using a five-component mixture model to represent various differential expression patterns. The hierarchical mixture model explicitly accounts for outlier expressions and inferences are based on samples from posterior distributions generated from a Markov chain Monte Carlo algorithm. We present simulation and real-life datasets analysis to demonstrate our proposed methodology.