Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures

Massively parallel DNA sequencing technologies have revolutionized genomics and molecular biology by producing large volumes of high quality DNA sequence data at a relatively low cost. However, this growthin sequence data establishes the need for more powerful computational hardware infrastructure a...

Full description

Saved in:

Bibliographic Details
Main Author:	Liu, Yongchao.
Other Authors:	Douglas Leslie Maskell
Format:	Theses and Dissertations
Language:	English
Published:	2012
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Online Access:	http://hdl.handle.net/10356/50628
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Massively parallel DNA sequencing technologies have revolutionized genomics and molecular biology by producing large volumes of high quality DNA sequence data at a relatively low cost. However, this growthin sequence data establishes the need for more powerful computational hardware infrastructure and more sophisticated software algorithms for efficient management and analysis. This poses a number of challengesto the bioinformatics community in order to meet the compute-intensive and data-intensive requirements of current sequence analysis. This thesis makes contributions by conceiving and developing parallel algorithms for three primary research areas in bioinformatics, i.e. sequence alignment, motif discovery and genome sequencing, targeting heterogeneous computing architectures consisting of CUDA-enabled GPUs, multi-core CPUs, and CPU/GPU clusters. By combining different parallel programming models, the heterogeneous computing architectures are able to provide support for three kinds of computations: device-level computation on GPUs, node-level multi-threaded computation on shared-memory CPUs, and cluster-level parallel and distributed computation over compute nodes. The primary contributions to sequence alignment are the investigation of three parallel algorithms: CUDASW++, MSA-CUDA and MSAProbs. CUDASW++ is a CUDA-based protein sequence database search algorithm for multiple GPUs. It produces better performance in terms of execution speed and accuracy compared to other publicly available tools such as SWPS3, SW-CUDA and NCBI-BLAST+. Both MSA-CUDA and MSAProbs are multiple protein sequence aligners. MSA-CUDA accelerates the ClustalW processing pipeline using CUDA and achieves significant speedups over sequential ClustalW on a single GPU. MSAProbs is a new and practical multi-threaded aligner based on the pair hidden Markov models and partition function posterior probabilities for shared-memory CPUs. It achieves statistically significant alignment accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons, and Probalign, while demonstrating competitive speed. The primary contribution to motif discovery is the investigation of CUDA-MEME, a parallel and distributed motif discovery algorithm based on the MEME algorithm.

Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures

Similar Items