Algorithm design and code optimization to speed-up bioinformatics software

LDhat is a Linux-based package written in C-language, used for analysis and calculation of recombination rate in large scale population genetic data using Hudson likelihood method, developed in Oxford University in 2004. It consists of various interlinked programs used for estimation of recombinatio...

Full description

Saved in:

Bibliographic Details
Main Author:	Ritika Jain.
Other Authors:	School of Computer Engineering
Format:	Final Year Project
Language:	English
Published:	2012
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Online Access:	http://hdl.handle.net/10356/48453
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	LDhat is a Linux-based package written in C-language, used for analysis and calculation of recombination rate in large scale population genetic data using Hudson likelihood method, developed in Oxford University in 2004. It consists of various interlinked programs used for estimation of recombination rates in phased and unphased data with missing information. The estimation of these rates allows scientists to experiment on methods such as gene targeting, understanding mutations and predicting presence of certain disease-causing genes. It is used by many bio-informatics researchers, National Institute of Health, United States of America being a major user. As of now, there are several parts of this program which may take up to several days to generate results, making it resource-consuming. The purpose of this project was to optimise the LDhat algorithm in order to speed-up the time taken by LDhat to process input files and generate results. Since this program is used for major bioinformatics studies, it was imperative that the optimisation techniques used do not affect the results generated. The basic method used for speed-up in the scope of this project was using parallel programming language, OpenMPI, on the existing code with multi-core processors provided by the Bioinformatics lab. The results were tested against the previous code to ensure the validity of results obtained and compute the speed-up achieved. Several approaches towards parallelisation were employed and the report explains the reasons for success and failure of each of them. The distributed-memory approach for parallel implementation of the code has successfully obtained almost linear speed-up in output generation by LDhat. The report compares various output graphs and speed obtained through this approach and makes recommendations which can be similarly employed in other parts of the program.

Algorithm design and code optimization to speed-up bioinformatics software

Similar Items