Improved ENSPART for DNA Motif Prediction
In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final m...
Saved in:
Main Authors: | , , , |
---|---|
Format: | E-Article |
Language: | English |
Published: |
Universiti Malaysia Sarawak (UNIMAS)
2017
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/19016/1/SCT-073-revised-deposit%20%28abstrak%29.pdf http://ir.unimas.my/id/eprint/19016/ http://www.ijbs.unimas.my/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Sarawak |
Language: | English |
Summary: | In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has several limitations: (1) the same background sequences are used for the calculation of Receiver Operating Cost (ROC) of motifs obtained from different datasets. This causes bias because different datasets might have different background distribution; (2) it does not consider the duplication of a motif and its reverse complement. This causes many redundant motifs in the result set which requires filtering. In this work, we extended the original ENSPART to solve those two issues. For the first issue, we employed background sequences that is based on the distribution of bases in the input sequences. As for the second issue, we employ a "triple" merging strategy to reduce redundant motifs. Our evaluation results indicate that the two improvements obtain better AUC values in comparison to the original implementation. |
---|