Improve the computation efficiency in epigenome-wide and genome-wide association studies

Background: Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) hold the promise to explore the relationships among genetic variants, epigenetic changes and human diseases. The challenges lie in their computational burden due to the number of data returned from epige...

Full description

Saved in:

Bibliographic Details
Main Author:	Tran, Nhat Sang
Other Authors:	Kwoh Chee Keong
Format:	Final Year Project
Language:	English
Published:	2015
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Online Access:	http://hdl.handle.net/10356/65735
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-65735
record_format	dspace
spelling	sg-ntu-dr.10356-657352023-03-03T20:50:35Z Improve the computation efficiency in epigenome-wide and genome-wide association studies Tran, Nhat Sang Kwoh Chee Keong School of Computer Engineering Centre for Computational Intelligence DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences Background: Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) hold the promise to explore the relationships among genetic variants, epigenetic changes and human diseases. The challenges lie in their computational burden due to the number of data returned from epigenetic measures (450k CpGs measured by Illumina Infinium 450k array) and genetic variants (millions of SNPs by sequence technology). As EWAS is a young and emerging topic, comprehensive computational supports are currently far behind the demands. An R package called GEM was created to discover how genetic variants (G) and environment factors (E) influenced methylation changes (M) in EWAS. The first generation of GEM uses linear model to determine the associations, so GEM finds it difficult to go through millions of regressions in large sample size. Solution: In this project, we implement the second generation GEM. We replaced the linear regression in the old GEM package with the newly developed semi-parallel approach. We first simulated pseudo methylation data, SNP data and environment data. Then we benchmark new Gmodel and Emodel by comparing the results with the standard respective functions in the old GEM. We showed the new Emodel can achieve around 500 times of efficiency with 1,000 samples and 10,000 CpGs; Gmodel can greatly improve the efficiency of more than 1,500 times with the same sample and CpG size and 60,000 SNPs. Conclusion: We implemented the new models and reported the computational efficiency of them. We also analysed the quality of accuracy in their results. This quality control process proved that our solution is reliable and should be applied in real study. Bachelor of Engineering (Computer Science) 2015-12-10T08:50:33Z 2015-12-10T08:50:33Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/65735 en Nanyang Technological University 46 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences Tran, Nhat Sang Improve the computation efficiency in epigenome-wide and genome-wide association studies
description	Background: Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) hold the promise to explore the relationships among genetic variants, epigenetic changes and human diseases. The challenges lie in their computational burden due to the number of data returned from epigenetic measures (450k CpGs measured by Illumina Infinium 450k array) and genetic variants (millions of SNPs by sequence technology). As EWAS is a young and emerging topic, comprehensive computational supports are currently far behind the demands. An R package called GEM was created to discover how genetic variants (G) and environment factors (E) influenced methylation changes (M) in EWAS. The first generation of GEM uses linear model to determine the associations, so GEM finds it difficult to go through millions of regressions in large sample size. Solution: In this project, we implement the second generation GEM. We replaced the linear regression in the old GEM package with the newly developed semi-parallel approach. We first simulated pseudo methylation data, SNP data and environment data. Then we benchmark new Gmodel and Emodel by comparing the results with the standard respective functions in the old GEM. We showed the new Emodel can achieve around 500 times of efficiency with 1,000 samples and 10,000 CpGs; Gmodel can greatly improve the efficiency of more than 1,500 times with the same sample and CpG size and 60,000 SNPs. Conclusion: We implemented the new models and reported the computational efficiency of them. We also analysed the quality of accuracy in their results. This quality control process proved that our solution is reliable and should be applied in real study.
author2	Kwoh Chee Keong
author_facet	Kwoh Chee Keong Tran, Nhat Sang
format	Final Year Project
author	Tran, Nhat Sang
author_sort	Tran, Nhat Sang
title	Improve the computation efficiency in epigenome-wide and genome-wide association studies
title_short	Improve the computation efficiency in epigenome-wide and genome-wide association studies
title_full	Improve the computation efficiency in epigenome-wide and genome-wide association studies
title_fullStr	Improve the computation efficiency in epigenome-wide and genome-wide association studies
title_full_unstemmed	Improve the computation efficiency in epigenome-wide and genome-wide association studies
title_sort	improve the computation efficiency in epigenome-wide and genome-wide association studies
publishDate	2015
url	http://hdl.handle.net/10356/65735
_version_	1759855985483055104

Improve the computation efficiency in epigenome-wide and genome-wide association studies

Similar Items