BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE
Sequence alignment is an important tool in the field of biology before conducting sequence analysis structurally or functionally. Comparing sequences with sequence alignment is to find the optimal solution in the form of pairs of residual sequences with the best cumulative similarity score. The scor...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39896 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:39896 |
---|---|
spelling |
id-itb.:398962019-06-28T13:26:17ZBIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE Setyorini Indonesia Dissertations sequence alignment, dynamic programming, approximate string matching, bit-parallelism. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39896 Sequence alignment is an important tool in the field of biology before conducting sequence analysis structurally or functionally. Comparing sequences with sequence alignment is to find the optimal solution in the form of pairs of residual sequences with the best cumulative similarity score. The score is chosen by considering the weight of the match of the residual pair that represents the biological characteristics. Dynamic Programming (DP) is a classic algorithm for optimization problems. This algorithm is still a core algorithm in various sequence alignment tools. DP characteristics that trace all solution spaces ensure that optimal solutions can always be obtained, this must be paid for by computational time. At certain sizes DP computing time becomes unrealistic. There are 2 approaches to efforts to improve DP computing time, parallelization that requires additional processing units; then approximation, which is cutting down the search space for solutions with certain heuristic guidelines. Bit-parallelism appears as one solution to the problem of approximate strings. Bit-parallelism parallels the computational processes of the automata and DP algorithm. Parallel DP is done by combining units of the DP computation process into one unit to be processed like processing a word on a computer. Merging processes reduce computing time. Transformation is needed to make the combined form of the process unit resemble a word. Identifying algorithms in a new form is also needed to produce the same score matrix content with DP but in a faster time. Bit-parallelism has been applied to cases of the Longest Common Subsequences (LCS) and Edit Distance which use simple similarity weights. Sequence alignment in biological cases, especially for DNA and protein sequences requires similarity computing with more complex weights. Proteins that have 24 residues, pairs of protein residues have different similarity values. The application of sequence alignment with computing bit-parallelism in this case requires adjusting both the transformation process and its computational algorithm. This dissertation presents a computational design of bit-parallelism scores with multi-integer weights to answer the needs of the biological sequence alignment. Computational design is compiled from the development of computational score General Integer scoring where computation is compiled from the logical operation of the status of the appearance of integer variables in the score computing algorithm. Computational empirical testing of scores on sequences and with lengths of n and m respectively shows that computational design is able to produce the same content matrix score with the DP algorithm, and reduce the computational time matrix score from O (mn) to O (m). Assuming the biggest current w is 64 bits, this algorithm has the potential to save the process about 30-70% times text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Sequence alignment is an important tool in the field of biology before conducting sequence analysis structurally or functionally. Comparing sequences with sequence alignment is to find the optimal solution in the form of pairs of residual sequences with the best cumulative similarity score. The score is chosen by considering the weight of the match of the residual pair that represents the biological characteristics.
Dynamic Programming (DP) is a classic algorithm for optimization problems. This algorithm is still a core algorithm in various sequence alignment tools. DP characteristics that trace all solution spaces ensure that optimal solutions can always be obtained, this must be paid for by computational time. At certain sizes DP computing time becomes unrealistic. There are 2 approaches to efforts to improve DP computing time, parallelization that requires additional processing units; then approximation, which is cutting down the search space for solutions with certain heuristic guidelines.
Bit-parallelism appears as one solution to the problem of approximate strings. Bit-parallelism parallels the computational processes of the automata and DP algorithm. Parallel DP is done by combining units of the DP computation process into one unit to be processed like processing a word on a computer. Merging processes reduce computing time. Transformation is needed to make the combined form of the process unit resemble a word. Identifying algorithms in a new form is also needed to produce the same score matrix content with DP but in a faster time.
Bit-parallelism has been applied to cases of the Longest Common Subsequences (LCS) and Edit Distance which use simple similarity weights. Sequence alignment in biological cases, especially for DNA and protein sequences requires similarity computing with more complex weights. Proteins that have 24 residues, pairs of protein residues have different similarity values. The application of sequence alignment with computing bit-parallelism in this case requires adjusting both the transformation process and its computational algorithm.
This dissertation presents a computational design of bit-parallelism scores with multi-integer weights to answer the needs of the biological sequence alignment. Computational design is compiled from the development of computational score General Integer scoring where computation is compiled from the logical operation of the status of the appearance of integer
variables in the score computing algorithm. Computational empirical testing of scores on sequences and with lengths of n and m respectively shows that computational design is able to produce the same content matrix score with the DP algorithm, and reduce the computational time matrix score from O (mn) to O (m). Assuming the biggest current w is 64 bits, this algorithm has the potential to save the process about 30-70% times |
format |
Dissertations |
author |
Setyorini |
spellingShingle |
Setyorini BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE |
author_facet |
Setyorini |
author_sort |
Setyorini |
title |
BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE |
title_short |
BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE |
title_full |
BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE |
title_fullStr |
BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE |
title_full_unstemmed |
BIT-PARALLELISM SCORE COMPUTATION USING MULTI INTEGER WEIGHT FOR PAIRWISE SEQUENCE ALIGNMENT CASE : BIOLOGICAL SEQUENCE |
title_sort |
bit-parallelism score computation using multi integer weight for pairwise sequence alignment case : biological sequence |
url |
https://digilib.itb.ac.id/gdl/view/39896 |
_version_ |
1823638848041648128 |