Improved ENSPART for DNA Motif Prediction

In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final m...

Full description

Saved in:
Bibliographic Details
Main Authors: Choong, Allen Chieng Hoon, Lee, Nung Kion, Bong, Chih How, Norshafarina, Omar
Format: E-Article
Language:English
Published: Universiti Malaysia Sarawak (UNIMAS) 2017
Subjects:
Online Access:http://ir.unimas.my/id/eprint/19016/1/SCT-073-revised-deposit%20%28abstrak%29.pdf
http://ir.unimas.my/id/eprint/19016/
http://www.ijbs.unimas.my/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sarawak
Language: English
id my.unimas.ir.19016
record_format eprints
spelling my.unimas.ir.190162018-01-03T06:06:16Z http://ir.unimas.my/id/eprint/19016/ Improved ENSPART for DNA Motif Prediction Choong, Allen Chieng Hoon Lee, Nung Kion Bong, Chih How Norshafarina, Omar Q Science (General) T Technology (General) In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has several limitations: (1) the same background sequences are used for the calculation of Receiver Operating Cost (ROC) of motifs obtained from different datasets. This causes bias because different datasets might have different background distribution; (2) it does not consider the duplication of a motif and its reverse complement. This causes many redundant motifs in the result set which requires filtering. In this work, we extended the original ENSPART to solve those two issues. For the first issue, we employed background sequences that is based on the distribution of bases in the input sequences. As for the second issue, we employ a "triple" merging strategy to reduce redundant motifs. Our evaluation results indicate that the two improvements obtain better AUC values in comparison to the original implementation. Universiti Malaysia Sarawak (UNIMAS) 2017-12 E-Article PeerReviewed text en http://ir.unimas.my/id/eprint/19016/1/SCT-073-revised-deposit%20%28abstrak%29.pdf Choong, Allen Chieng Hoon and Lee, Nung Kion and Bong, Chih How and Norshafarina, Omar (2017) Improved ENSPART for DNA Motif Prediction. International Journal of Business and Society, 18 (S4). pp. 1-6. ISSN 15116670 http://www.ijbs.unimas.my/
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic Q Science (General)
T Technology (General)
spellingShingle Q Science (General)
T Technology (General)
Choong, Allen Chieng Hoon
Lee, Nung Kion
Bong, Chih How
Norshafarina, Omar
Improved ENSPART for DNA Motif Prediction
description In our previous work we proposed ENSPART-an ensemble method for DNA motif discovery which partitions input dataset into several equal size subsets runs by several distinct tools for candidate motif prediction. The candidate motifs obtained from different data subsets are merged to obtain the final motifs. Nevertheless, the original ENSPART has several limitations: (1) the same background sequences are used for the calculation of Receiver Operating Cost (ROC) of motifs obtained from different datasets. This causes bias because different datasets might have different background distribution; (2) it does not consider the duplication of a motif and its reverse complement. This causes many redundant motifs in the result set which requires filtering. In this work, we extended the original ENSPART to solve those two issues. For the first issue, we employed background sequences that is based on the distribution of bases in the input sequences. As for the second issue, we employ a "triple" merging strategy to reduce redundant motifs. Our evaluation results indicate that the two improvements obtain better AUC values in comparison to the original implementation.
format E-Article
author Choong, Allen Chieng Hoon
Lee, Nung Kion
Bong, Chih How
Norshafarina, Omar
author_facet Choong, Allen Chieng Hoon
Lee, Nung Kion
Bong, Chih How
Norshafarina, Omar
author_sort Choong, Allen Chieng Hoon
title Improved ENSPART for DNA Motif Prediction
title_short Improved ENSPART for DNA Motif Prediction
title_full Improved ENSPART for DNA Motif Prediction
title_fullStr Improved ENSPART for DNA Motif Prediction
title_full_unstemmed Improved ENSPART for DNA Motif Prediction
title_sort improved enspart for dna motif prediction
publisher Universiti Malaysia Sarawak (UNIMAS)
publishDate 2017
url http://ir.unimas.my/id/eprint/19016/1/SCT-073-revised-deposit%20%28abstrak%29.pdf
http://ir.unimas.my/id/eprint/19016/
http://www.ijbs.unimas.my/
_version_ 1644512979187662848