FICS: Fast DNA/RNA to amino acid alignment using data level parallelism

Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on...

Full description

Saved in:
Bibliographic Details
Main Authors: Lim, Stanley Vincent Wee Ebol, Lim, Steven Edward Cheng, Ting, Carlos Louis Pacifico, Wong, Aaron Eldrich Cue
Format: text
Language:English
Published: Animo Repository 2022
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etdb_comtech/4
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etdb_comtech-1004
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etdb_comtech-10042022-09-14T06:39:10Z FICS: Fast DNA/RNA to amino acid alignment using data level parallelism Lim, Stanley Vincent Wee Ebol Lim, Steven Edward Cheng Ting, Carlos Louis Pacifico Wong, Aaron Eldrich Cue Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on the first step, which is the sequence alignment phase, where reads are mapped to a reference proteome. Frame alignment algorithm is specifically used to map a DNA/RNA sequence to a reference proteome. A non-model organism is an organism in which there is no proteome model, and it can be mapped in two ways: de novo mapping or close reference proteome mapping. In this study, the research focused on the close reference mapping of the Scylla serrata (mud-crab) by using the Drosophila melanogaster (fruit fly) as the reference proteome model. This would require mapping of millions of reads to the whole reference proteome, thus the need to speed up the process of the alignment phase. Since most of the frame algorithms are implemented sequentially, this study proposes FICS which is a DNA/RNA to protein sequence alignment implementation using data level parallelism. It includes a conversion of a sequential frame alignment algorithm to the SIMD paradigm and implementations to three different technologies namely, Intel SIMD ISA(AVX2), CUDA, and FPGA. Analysis shows that the Intel SIMD ISA implementation had a speedup of 3.5x with an average matrix computation time of 2.5ms. Furthermore, its memory consumption peaked at 231MB and required around 42-52 Watts of power during runtime. On the other hand, the CUDA implementation of the frame alignment algorithm in the SIMT paradigm resulted in suboptimal speeds, using up to 270MiB of memory space and took in around 61-63 Watts during runtime. The FPGA implementation only included the two input data preparations with a speedup of about 13940 times, consuming a maximum memory of 580KB, and having a power consumption of around 2 Watts. 2022-01-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdb_comtech/4 Computer Technology Bachelor's Theses English Animo Repository Bioinformatics Nucleotide sequence Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Bioinformatics
Nucleotide sequence
Computer Sciences
spellingShingle Bioinformatics
Nucleotide sequence
Computer Sciences
Lim, Stanley Vincent Wee Ebol
Lim, Steven Edward Cheng
Ting, Carlos Louis Pacifico
Wong, Aaron Eldrich Cue
FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
description Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on the first step, which is the sequence alignment phase, where reads are mapped to a reference proteome. Frame alignment algorithm is specifically used to map a DNA/RNA sequence to a reference proteome. A non-model organism is an organism in which there is no proteome model, and it can be mapped in two ways: de novo mapping or close reference proteome mapping. In this study, the research focused on the close reference mapping of the Scylla serrata (mud-crab) by using the Drosophila melanogaster (fruit fly) as the reference proteome model. This would require mapping of millions of reads to the whole reference proteome, thus the need to speed up the process of the alignment phase. Since most of the frame algorithms are implemented sequentially, this study proposes FICS which is a DNA/RNA to protein sequence alignment implementation using data level parallelism. It includes a conversion of a sequential frame alignment algorithm to the SIMD paradigm and implementations to three different technologies namely, Intel SIMD ISA(AVX2), CUDA, and FPGA. Analysis shows that the Intel SIMD ISA implementation had a speedup of 3.5x with an average matrix computation time of 2.5ms. Furthermore, its memory consumption peaked at 231MB and required around 42-52 Watts of power during runtime. On the other hand, the CUDA implementation of the frame alignment algorithm in the SIMT paradigm resulted in suboptimal speeds, using up to 270MiB of memory space and took in around 61-63 Watts during runtime. The FPGA implementation only included the two input data preparations with a speedup of about 13940 times, consuming a maximum memory of 580KB, and having a power consumption of around 2 Watts.
format text
author Lim, Stanley Vincent Wee Ebol
Lim, Steven Edward Cheng
Ting, Carlos Louis Pacifico
Wong, Aaron Eldrich Cue
author_facet Lim, Stanley Vincent Wee Ebol
Lim, Steven Edward Cheng
Ting, Carlos Louis Pacifico
Wong, Aaron Eldrich Cue
author_sort Lim, Stanley Vincent Wee Ebol
title FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
title_short FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
title_full FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
title_fullStr FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
title_full_unstemmed FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
title_sort fics: fast dna/rna to amino acid alignment using data level parallelism
publisher Animo Repository
publishDate 2022
url https://animorepository.dlsu.edu.ph/etdb_comtech/4
_version_ 1744376653618872320