Reoptimization of the Consensus Pattern Problem under Pattern Length Modification

In Bioinformatics, finding conserved regions in genomic sequences remains to be a challenge not just because of the increasing size of genomic data collected but because of the hardness of the combinatorial model of the problem. One problem formulation is called the Consensus Pattern Problem (CPP)....

Full description

Saved in:
Bibliographic Details
Main Authors: Clemente, Jhoirene B, Fernandez, Proceso L, Jr, Juayong, Richelle Ann B, Malinao, Jasmine A, Ordanel, Ivy, Adorna, Henry N
Format: text
Published: Archīum Ateneo 2019
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/298
https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1301&context=discs-faculty-pubs
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.discs-faculty-pubs-1301
record_format eprints
spelling ph-ateneo-arc.discs-faculty-pubs-13012022-04-28T08:59:08Z Reoptimization of the Consensus Pattern Problem under Pattern Length Modification Clemente, Jhoirene B Fernandez, Proceso L, Jr Juayong, Richelle Ann B Malinao, Jasmine A Ordanel, Ivy Adorna, Henry N In Bioinformatics, finding conserved regions in genomic sequences remains to be a challenge not just because of the increasing size of genomic data collected but because of the hardness of the combinatorial model of the problem. One problem formulation is called the Consensus Pattern Problem (CPP). Given a set of t n-length strings S = {S1,..., St} defined over some constant size alphabet Σ and an integer l, where l ≤ n, the objective of CPP is to find an l-length string v and a set of l-length substrings si of each Si in S such that the total sum of d(si, v) is minimized for all 1 ≤ i ≤ t. Here d(x, y) denotes the Hamming distance between the two strings x and y. It is known that CPP is NP-hard i.e., unless P = NP, there is no polynomial-time algorithm that produces an optimal solution for CPP. In this study, we investigate a combinatorial setting called reoptimization in finding an approximate solution for this problem. We seek to identify whether a specific additional information can help in solving CPP. Specifically, we deal with the following reoptimization scenario. Suppose we have an optimal l-length consensus substring of a given set of sequences S. How can this information be beneficial in obtaining an (l + k)-length and (l – k)-length consensus for S? In this paper, we show that the reoptimization variant of the problem is still computationally hard even with k = 1. In response, we present four algorithms that make use of the given optimal solution – we prove that the first three algorithms produce solutions with quality that is bounded from above by an additive error that grows as the parameter k increases, while the fourth algorithm achieves a guaranteed approximation ratio. It has been shown that there is no efficient polynomial-time approximation scheme for CPP (Boucher 2015). In this paper, we show that we can save ( − ( + ) + ) ( ( + ) / ) steps in computation from the original running time of the known polynomial-time approximation scheme for CPP. 2019-09-01T07:00:00Z text application/pdf https://archium.ateneo.edu/discs-faculty-pubs/298 https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1301&context=discs-faculty-pubs Department of Information Systems & Computer Science Faculty Publications Archīum Ateneo approximation consensus pattern PTAS reoptimization Computer Sciences
institution Ateneo De Manila University
building Ateneo De Manila University Library
continent Asia
country Philippines
Philippines
content_provider Ateneo De Manila University Library
collection archium.Ateneo Institutional Repository
topic approximation
consensus pattern
PTAS
reoptimization
Computer Sciences
spellingShingle approximation
consensus pattern
PTAS
reoptimization
Computer Sciences
Clemente, Jhoirene B
Fernandez, Proceso L, Jr
Juayong, Richelle Ann B
Malinao, Jasmine A
Ordanel, Ivy
Adorna, Henry N
Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
description In Bioinformatics, finding conserved regions in genomic sequences remains to be a challenge not just because of the increasing size of genomic data collected but because of the hardness of the combinatorial model of the problem. One problem formulation is called the Consensus Pattern Problem (CPP). Given a set of t n-length strings S = {S1,..., St} defined over some constant size alphabet Σ and an integer l, where l ≤ n, the objective of CPP is to find an l-length string v and a set of l-length substrings si of each Si in S such that the total sum of d(si, v) is minimized for all 1 ≤ i ≤ t. Here d(x, y) denotes the Hamming distance between the two strings x and y. It is known that CPP is NP-hard i.e., unless P = NP, there is no polynomial-time algorithm that produces an optimal solution for CPP. In this study, we investigate a combinatorial setting called reoptimization in finding an approximate solution for this problem. We seek to identify whether a specific additional information can help in solving CPP. Specifically, we deal with the following reoptimization scenario. Suppose we have an optimal l-length consensus substring of a given set of sequences S. How can this information be beneficial in obtaining an (l + k)-length and (l – k)-length consensus for S? In this paper, we show that the reoptimization variant of the problem is still computationally hard even with k = 1. In response, we present four algorithms that make use of the given optimal solution – we prove that the first three algorithms produce solutions with quality that is bounded from above by an additive error that grows as the parameter k increases, while the fourth algorithm achieves a guaranteed approximation ratio. It has been shown that there is no efficient polynomial-time approximation scheme for CPP (Boucher 2015). In this paper, we show that we can save ( − ( + ) + ) ( ( + ) / ) steps in computation from the original running time of the known polynomial-time approximation scheme for CPP.
format text
author Clemente, Jhoirene B
Fernandez, Proceso L, Jr
Juayong, Richelle Ann B
Malinao, Jasmine A
Ordanel, Ivy
Adorna, Henry N
author_facet Clemente, Jhoirene B
Fernandez, Proceso L, Jr
Juayong, Richelle Ann B
Malinao, Jasmine A
Ordanel, Ivy
Adorna, Henry N
author_sort Clemente, Jhoirene B
title Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
title_short Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
title_full Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
title_fullStr Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
title_full_unstemmed Reoptimization of the Consensus Pattern Problem under Pattern Length Modification
title_sort reoptimization of the consensus pattern problem under pattern length modification
publisher Archīum Ateneo
publishDate 2019
url https://archium.ateneo.edu/discs-faculty-pubs/298
https://archium.ateneo.edu/cgi/viewcontent.cgi?article=1301&context=discs-faculty-pubs
_version_ 1733052858700398592