Improved ENSPART for DNA Motif Prediction

In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final m...

Full description

Saved in:
Bibliographic Details
Main Authors: Choong, Allen Chieng Hoon, Lee, Nung Kion, Bong, Chih How, Norshafarina, Omar
Format: E-Article
Language:English
Published: Universiti Malaysia Sarawak (UNIMAS) 2017
Subjects:
Online Access:http://ir.unimas.my/id/eprint/19016/1/SCT-073-revised-deposit%20%28abstrak%29.pdf
http://ir.unimas.my/id/eprint/19016/
http://www.ijbs.unimas.my/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sarawak
Language: English
Description
Summary:In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has several limitations: (1) the same background sequences are used for the calculation of Receiver Operating Cost (ROC) of motifs obtained from different datasets. This causes bias because different datasets might have different background distribution; (2) it does not consider the duplication of a motif and its reverse complement. This causes many redundant motifs in the result set which requires filtering. In this work, we extended the original ENSPART to solve those two issues. For the first issue, we employed background sequences that is based on the distribution of bases in the input sequences. As for the second issue, we employ a "triple" merging strategy to reduce redundant motifs. Our evaluation results indicate that the two improvements obtain better AUC values in comparison to the original implementation.