Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures

Massively parallel DNA sequencing technologies have revolutionized genomics and molecular biology by producing large volumes of high quality DNA sequence data at a relatively low cost. However, this growthin sequence data establishes the need for more powerful computational hardware infrastructure a...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Yongchao.
Other Authors: Douglas Leslie Maskell
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/50628
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-50628
record_format dspace
spelling sg-ntu-dr.10356-506282023-03-04T00:34:33Z Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures Liu, Yongchao. Douglas Leslie Maskell School of Computer Engineering Centre for High Performance Embedded Systems DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences Massively parallel DNA sequencing technologies have revolutionized genomics and molecular biology by producing large volumes of high quality DNA sequence data at a relatively low cost. However, this growthin sequence data establishes the need for more powerful computational hardware infrastructure and more sophisticated software algorithms for efficient management and analysis. This poses a number of challengesto the bioinformatics community in order to meet the compute-intensive and data-intensive requirements of current sequence analysis. This thesis makes contributions by conceiving and developing parallel algorithms for three primary research areas in bioinformatics, i.e. sequence alignment, motif discovery and genome sequencing, targeting heterogeneous computing architectures consisting of CUDA-enabled GPUs, multi-core CPUs, and CPU/GPU clusters. By combining different parallel programming models, the heterogeneous computing architectures are able to provide support for three kinds of computations: device-level computation on GPUs, node-level multi-threaded computation on shared-memory CPUs, and cluster-level parallel and distributed computation over compute nodes. The primary contributions to sequence alignment are the investigation of three parallel algorithms: CUDASW++, MSA-CUDA and MSAProbs. CUDASW++ is a CUDA-based protein sequence database search algorithm for multiple GPUs. It produces better performance in terms of execution speed and accuracy compared to other publicly available tools such as SWPS3, SW-CUDA and NCBI-BLAST+. Both MSA-CUDA and MSAProbs are multiple protein sequence aligners. MSA-CUDA accelerates the ClustalW processing pipeline using CUDA and achieves significant speedups over sequential ClustalW on a single GPU. MSAProbs is a new and practical multi-threaded aligner based on the pair hidden Markov models and partition function posterior probabilities for shared-memory CPUs. It achieves statistically significant alignment accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons, and Probalign, while demonstrating competitive speed. The primary contribution to motif discovery is the investigation of CUDA-MEME, a parallel and distributed motif discovery algorithm based on the MEME algorithm. Doctor of Philosophy 2012-08-07T09:15:52Z 2012-08-07T09:15:52Z 2012 2012 Thesis http://hdl.handle.net/10356/50628 en 201 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Liu, Yongchao.
Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
description Massively parallel DNA sequencing technologies have revolutionized genomics and molecular biology by producing large volumes of high quality DNA sequence data at a relatively low cost. However, this growthin sequence data establishes the need for more powerful computational hardware infrastructure and more sophisticated software algorithms for efficient management and analysis. This poses a number of challengesto the bioinformatics community in order to meet the compute-intensive and data-intensive requirements of current sequence analysis. This thesis makes contributions by conceiving and developing parallel algorithms for three primary research areas in bioinformatics, i.e. sequence alignment, motif discovery and genome sequencing, targeting heterogeneous computing architectures consisting of CUDA-enabled GPUs, multi-core CPUs, and CPU/GPU clusters. By combining different parallel programming models, the heterogeneous computing architectures are able to provide support for three kinds of computations: device-level computation on GPUs, node-level multi-threaded computation on shared-memory CPUs, and cluster-level parallel and distributed computation over compute nodes. The primary contributions to sequence alignment are the investigation of three parallel algorithms: CUDASW++, MSA-CUDA and MSAProbs. CUDASW++ is a CUDA-based protein sequence database search algorithm for multiple GPUs. It produces better performance in terms of execution speed and accuracy compared to other publicly available tools such as SWPS3, SW-CUDA and NCBI-BLAST+. Both MSA-CUDA and MSAProbs are multiple protein sequence aligners. MSA-CUDA accelerates the ClustalW processing pipeline using CUDA and achieves significant speedups over sequential ClustalW on a single GPU. MSAProbs is a new and practical multi-threaded aligner based on the pair hidden Markov models and partition function posterior probabilities for shared-memory CPUs. It achieves statistically significant alignment accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons, and Probalign, while demonstrating competitive speed. The primary contribution to motif discovery is the investigation of CUDA-MEME, a parallel and distributed motif discovery algorithm based on the MEME algorithm.
author2 Douglas Leslie Maskell
author_facet Douglas Leslie Maskell
Liu, Yongchao.
format Theses and Dissertations
author Liu, Yongchao.
author_sort Liu, Yongchao.
title Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
title_short Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
title_full Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
title_fullStr Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
title_full_unstemmed Design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
title_sort design and implementation of parallel bioinformatics algorithms on heterogeneous computing architectures
publishDate 2012
url http://hdl.handle.net/10356/50628
_version_ 1759854297552519168