FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2022
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etdb_comtech/4 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etdb_comtech-1004 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etdb_comtech-10042022-09-14T06:39:10Z FICS: Fast DNA/RNA to amino acid alignment using data level parallelism Lim, Stanley Vincent Wee Ebol Lim, Steven Edward Cheng Ting, Carlos Louis Pacifico Wong, Aaron Eldrich Cue Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on the first step, which is the sequence alignment phase, where reads are mapped to a reference proteome. Frame alignment algorithm is specifically used to map a DNA/RNA sequence to a reference proteome. A non-model organism is an organism in which there is no proteome model, and it can be mapped in two ways: de novo mapping or close reference proteome mapping. In this study, the research focused on the close reference mapping of the Scylla serrata (mud-crab) by using the Drosophila melanogaster (fruit fly) as the reference proteome model. This would require mapping of millions of reads to the whole reference proteome, thus the need to speed up the process of the alignment phase. Since most of the frame algorithms are implemented sequentially, this study proposes FICS which is a DNA/RNA to protein sequence alignment implementation using data level parallelism. It includes a conversion of a sequential frame alignment algorithm to the SIMD paradigm and implementations to three different technologies namely, Intel SIMD ISA(AVX2), CUDA, and FPGA. Analysis shows that the Intel SIMD ISA implementation had a speedup of 3.5x with an average matrix computation time of 2.5ms. Furthermore, its memory consumption peaked at 231MB and required around 42-52 Watts of power during runtime. On the other hand, the CUDA implementation of the frame alignment algorithm in the SIMT paradigm resulted in suboptimal speeds, using up to 270MiB of memory space and took in around 61-63 Watts during runtime. The FPGA implementation only included the two input data preparations with a speedup of about 13940 times, consuming a maximum memory of 580KB, and having a power consumption of around 2 Watts. 2022-01-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdb_comtech/4 Computer Technology Bachelor's Theses English Animo Repository Bioinformatics Nucleotide sequence Computer Sciences |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Bioinformatics Nucleotide sequence Computer Sciences |
spellingShingle |
Bioinformatics Nucleotide sequence Computer Sciences Lim, Stanley Vincent Wee Ebol Lim, Steven Edward Cheng Ting, Carlos Louis Pacifico Wong, Aaron Eldrich Cue FICS: Fast DNA/RNA to amino acid alignment using data level parallelism |
description |
Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on the first step, which is the sequence alignment phase, where reads are mapped to a reference proteome. Frame alignment algorithm is specifically used to map a DNA/RNA sequence to a reference proteome. A non-model organism is an organism in which there is no proteome model, and it can be mapped in two ways: de novo mapping or close reference proteome mapping. In this study, the research focused on the close reference mapping of the Scylla serrata (mud-crab) by using the Drosophila melanogaster (fruit fly) as the reference proteome model. This would require mapping of millions of reads to the whole reference proteome, thus the need to speed up the process of the alignment phase. Since most of the frame algorithms are implemented sequentially, this study proposes FICS which is a DNA/RNA to protein sequence alignment implementation using data level parallelism. It includes a conversion of a sequential frame alignment algorithm to the SIMD paradigm and implementations to three different technologies namely, Intel SIMD ISA(AVX2), CUDA, and FPGA. Analysis shows that the Intel SIMD ISA implementation had a speedup of 3.5x with an average matrix computation time of 2.5ms. Furthermore, its memory consumption peaked at 231MB and required around 42-52 Watts of power during runtime. On the other hand, the CUDA implementation of the frame alignment algorithm in the SIMT paradigm resulted in suboptimal speeds, using up to 270MiB of memory space and took in around 61-63 Watts during runtime. The FPGA implementation only included the two input data preparations with a speedup of about 13940 times, consuming a maximum memory of 580KB, and having a power consumption of around 2 Watts. |
format |
text |
author |
Lim, Stanley Vincent Wee Ebol Lim, Steven Edward Cheng Ting, Carlos Louis Pacifico Wong, Aaron Eldrich Cue |
author_facet |
Lim, Stanley Vincent Wee Ebol Lim, Steven Edward Cheng Ting, Carlos Louis Pacifico Wong, Aaron Eldrich Cue |
author_sort |
Lim, Stanley Vincent Wee Ebol |
title |
FICS: Fast DNA/RNA to amino acid alignment using data level parallelism |
title_short |
FICS: Fast DNA/RNA to amino acid alignment using data level parallelism |
title_full |
FICS: Fast DNA/RNA to amino acid alignment using data level parallelism |
title_fullStr |
FICS: Fast DNA/RNA to amino acid alignment using data level parallelism |
title_full_unstemmed |
FICS: Fast DNA/RNA to amino acid alignment using data level parallelism |
title_sort |
fics: fast dna/rna to amino acid alignment using data level parallelism |
publisher |
Animo Repository |
publishDate |
2022 |
url |
https://animorepository.dlsu.edu.ph/etdb_comtech/4 |
_version_ |
1744376653618872320 |