Design and analysis of bioinformatics algorithms on an FPGA platform

The appearance of newer sequencing technologies has largely improved the development of molecular biology and genomic research. A large volume of gene information or protein data can be generated with lower cost, which leads to the exponential growth of existing gene banks or databases. Thus, it bec...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Yupeng
Other Authors: Bertil Schmidt
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:https://hdl.handle.net/10356/59979
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-59979
record_format dspace
spelling sg-ntu-dr.10356-599792023-03-04T00:43:15Z Design and analysis of bioinformatics algorithms on an FPGA platform Chen, Yupeng Bertil Schmidt Douglas Leslie Maskell School of Computer Engineering Centre for High Performance Embedded Systems DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability The appearance of newer sequencing technologies has largely improved the development of molecular biology and genomic research. A large volume of gene information or protein data can be generated with lower cost, which leads to the exponential growth of existing gene banks or databases. Thus, it becomes a challenging task for conventional algorithms or tools to extract information with biological significance among these ever increasing databases. There is an urgent need for innovative methods, algorithms, or tools to accomplish these complicated data analysis tasks on a more computationally powerful platform. After decades of development, the FPGA has proved itself in the field of high performance reconfigurable computation. For each generation, one can expect an immediate performance boost with the help of newer manufacturing technologies and a larger volume of resources on a single chip, both of which make it a competitive candidate for application acceleration. This thesis investigates the potential of solving bioinformatics problems utilizing the outstanding computation capability of FPGAs. To give a quantitative analysis on an FPGA’s performance, we choose two bioinformatics problems as the case study: genome search and short read alignment. An efficient architecture is proposed and implemented to maximize the performance of each application on a Virtex-5 FPGA platform. For the genome search problem, we focused on the most time-consuming stage of NCBI BLASTN. Two innovative data structures, a parallel partitioned Bloom filter and a block hash table, are proposed to improve the computation efficiency. The Bloom filter design can support 16 data queries simultaneously with high filtration efficiency; while the latter bypasses the stringent requirement of a perfect/near-perfect hash function. The final system achieves over 10 times speedup against the single-thread NCBI BLASTN program testing with a 100kbase query sequence. In addition, the experiments also show that the match rate together with the degree of parallelism influence the system’s overall performance. For the short read alignment problem, the key is to efficiently locate the segment with a high degree of similarity in the reference genome for each short read. We choose the banded Smith-Waterman algorithm as the align core to narrow down the search space. An innovative architecture is proposed to approximate the conventional algorithm but with a higher degree of parallelism. The align core alone achieves over 30 times speedup. The final design is a hybrid system utilizing both CPU and FPGA. The influence of different partitioning strategies is examined and the performance analysis also indicates that the performance bottleneck changes from the alignment computation to the candidate region search. DOCTOR OF PHILOSOPHY (SCE) 2014-05-21T06:21:49Z 2014-05-21T06:21:49Z 2014 2014 Thesis Chen, Y. (2014). Design and analysis of bioinformatics algorithms on an FPGA platform. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/59979 10.32657/10356/59979 en 173 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability
spellingShingle DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability
Chen, Yupeng
Design and analysis of bioinformatics algorithms on an FPGA platform
description The appearance of newer sequencing technologies has largely improved the development of molecular biology and genomic research. A large volume of gene information or protein data can be generated with lower cost, which leads to the exponential growth of existing gene banks or databases. Thus, it becomes a challenging task for conventional algorithms or tools to extract information with biological significance among these ever increasing databases. There is an urgent need for innovative methods, algorithms, or tools to accomplish these complicated data analysis tasks on a more computationally powerful platform. After decades of development, the FPGA has proved itself in the field of high performance reconfigurable computation. For each generation, one can expect an immediate performance boost with the help of newer manufacturing technologies and a larger volume of resources on a single chip, both of which make it a competitive candidate for application acceleration. This thesis investigates the potential of solving bioinformatics problems utilizing the outstanding computation capability of FPGAs. To give a quantitative analysis on an FPGA’s performance, we choose two bioinformatics problems as the case study: genome search and short read alignment. An efficient architecture is proposed and implemented to maximize the performance of each application on a Virtex-5 FPGA platform. For the genome search problem, we focused on the most time-consuming stage of NCBI BLASTN. Two innovative data structures, a parallel partitioned Bloom filter and a block hash table, are proposed to improve the computation efficiency. The Bloom filter design can support 16 data queries simultaneously with high filtration efficiency; while the latter bypasses the stringent requirement of a perfect/near-perfect hash function. The final system achieves over 10 times speedup against the single-thread NCBI BLASTN program testing with a 100kbase query sequence. In addition, the experiments also show that the match rate together with the degree of parallelism influence the system’s overall performance. For the short read alignment problem, the key is to efficiently locate the segment with a high degree of similarity in the reference genome for each short read. We choose the banded Smith-Waterman algorithm as the align core to narrow down the search space. An innovative architecture is proposed to approximate the conventional algorithm but with a higher degree of parallelism. The align core alone achieves over 30 times speedup. The final design is a hybrid system utilizing both CPU and FPGA. The influence of different partitioning strategies is examined and the performance analysis also indicates that the performance bottleneck changes from the alignment computation to the candidate region search.
author2 Bertil Schmidt
author_facet Bertil Schmidt
Chen, Yupeng
format Theses and Dissertations
author Chen, Yupeng
author_sort Chen, Yupeng
title Design and analysis of bioinformatics algorithms on an FPGA platform
title_short Design and analysis of bioinformatics algorithms on an FPGA platform
title_full Design and analysis of bioinformatics algorithms on an FPGA platform
title_fullStr Design and analysis of bioinformatics algorithms on an FPGA platform
title_full_unstemmed Design and analysis of bioinformatics algorithms on an FPGA platform
title_sort design and analysis of bioinformatics algorithms on an fpga platform
publishDate 2014
url https://hdl.handle.net/10356/59979
_version_ 1759853634181398528