EMS-GT2: An Improved Exact Solution for the (l, d)-Planted Motif Problem

Finding DNA motifs is a widely studied area in the field of Computational Biology. Motifs signify different information that is useful for biologists. There are several variations of the motif finding problem, and one of these is called the (l, d)-motif search or Planted Motif Search problem (PMS)....

全面介紹

Saved in:
書目詳細資料
Main Authors: Fernandez, Proceso L, Jr, Ronquillo, Mark Joseph D
格式: text
出版: Archīum Ateneo 2017
主題:
在線閱讀:https://archium.ateneo.edu/discs-faculty-pubs/72
https://kkgpublications.com/ijtes-volume3-issue3-article5/
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Ateneo De Manila University
實物特徵
總結:Finding DNA motifs is a widely studied area in the field of Computational Biology. Motifs signify different information that is useful for biologists. There are several variations of the motif finding problem, and one of these is called the (l, d)-motif search or Planted Motif Search problem (PMS). In this paper, we propose the EMS-GT2 algorithm, an extension of the Exact Motif Search-Generate and Test (EMS- GT) which is an exact enumerative algorithm for PMS. In EMS-GT2, we incorporated a new speedup technique that is based on an important property that we have discovered, which we prove in this paper, and which has enabled a more efficient block-processing of candidate motifs. Our C++ implementation of EMS-GT2 running on synthetic data for several PMS challenge instances demonstrates that it is competitive with both the EMS-GT and qPMS9, the two current best exact solutions for PMS. In particular, EMS-GT2 is able to reduce the run-times of EMS-GT by 20.3%, 15.8% and 22.6% for the (l, d) challenge instances (13, 4), (15, 5) and (17, 6) respectively. It also outperforms qPMS9, having runtime reductions of 91.6%, 79.3%, 82.0%, 59.4% and 9.7% for the (9, 2), (11, 3), (13, 4), (15, 5) and (17, 6) synthetic challenge instances respectively.