Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method

Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its excellent performances. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates...

Full description

Saved in:
Bibliographic Details
Main Authors: Chieng, Allen Hoon Choong, Lee, Nung Kion
Format: E-Article
Language:English
Published: IEEE 2017
Subjects:
Online Access:http://ir.unimas.my/id/eprint/18960/7/Evaluation%20of%20Convolutionary%20Neural%20Networks%20%28abstract%29.pdf
http://ir.unimas.my/id/eprint/18960/
https://www.biorxiv.org/content/early/2017/10/25/186965
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sarawak
Language: English
id my.unimas.ir.18960
record_format eprints
spelling my.unimas.ir.189602019-08-01T02:35:28Z http://ir.unimas.my/id/eprint/18960/ Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method Chieng, Allen Hoon Choong Lee, Nung Kion Q Science (General) Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its excellent performances. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences. We compare the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time. In addition, the one-hot encoding performances are rather consistent across various datasets but would require suitable CNN configuration to perform well. The ordinal encoding with matrix representation performs best in some of the evaluated datasets. This study implies that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method. IEEE 2017 E-Article PeerReviewed text en http://ir.unimas.my/id/eprint/18960/7/Evaluation%20of%20Convolutionary%20Neural%20Networks%20%28abstract%29.pdf Chieng, Allen Hoon Choong and Lee, Nung Kion (2017) Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method. International Conference On Computer And Drone Applications (ICONDA) 2017. ISSN 978-1-5386-0765-7 (ISBN) (In Press) https://www.biorxiv.org/content/early/2017/10/25/186965 DOI: 10.1101/186965
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic Q Science (General)
spellingShingle Q Science (General)
Chieng, Allen Hoon Choong
Lee, Nung Kion
Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
description Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its excellent performances. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluates a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences. We compare the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time. In addition, the one-hot encoding performances are rather consistent across various datasets but would require suitable CNN configuration to perform well. The ordinal encoding with matrix representation performs best in some of the evaluated datasets. This study implies that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method.
format E-Article
author Chieng, Allen Hoon Choong
Lee, Nung Kion
author_facet Chieng, Allen Hoon Choong
Lee, Nung Kion
author_sort Chieng, Allen Hoon Choong
title Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_short Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_full Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_fullStr Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_full_unstemmed Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method
title_sort evaluation of convolutionary neural networks modeling of dna sequences using ordinal versus one-hot encoding method
publisher IEEE
publishDate 2017
url http://ir.unimas.my/id/eprint/18960/7/Evaluation%20of%20Convolutionary%20Neural%20Networks%20%28abstract%29.pdf
http://ir.unimas.my/id/eprint/18960/
https://www.biorxiv.org/content/early/2017/10/25/186965
_version_ 1644512965587632128