iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
DNA N⁶-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N⁶-methyladenine in the rice genome, higher than in most of the other s...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/150974 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-150974 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1509742021-07-29T12:47:16Z iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule Le, Nguyen Quoc Khanh School of Humanities Humanities::General Skip Gram Continuous Bag of Words DNA N⁶-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N⁶-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N⁶-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences. The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. 2021-07-29T12:47:15Z 2021-07-29T12:47:15Z 2019 Journal Article Le, N. Q. K. (2019). iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule. Molecular Genetics and Genomics, 294(5), 1173-1182. https://dx.doi.org/10.1007/s00438-019-01570-y 1617-4615 0000-0003-4896-7926 https://hdl.handle.net/10356/150974 10.1007/s00438-019-01570-y 31055655 2-s2.0-85065388389 5 294 1173 1182 en Molecular Genetics and Genomics © 2019 Springer-Verlag GmbH Germany, part of Springer Nature. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Humanities::General Skip Gram Continuous Bag of Words |
spellingShingle |
Humanities::General Skip Gram Continuous Bag of Words Le, Nguyen Quoc Khanh iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule |
description |
DNA N⁶-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N⁶-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N⁶-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences. |
author2 |
School of Humanities |
author_facet |
School of Humanities Le, Nguyen Quoc Khanh |
format |
Article |
author |
Le, Nguyen Quoc Khanh |
author_sort |
Le, Nguyen Quoc Khanh |
title |
iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule |
title_short |
iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule |
title_full |
iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule |
title_fullStr |
iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule |
title_full_unstemmed |
iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule |
title_sort |
in6-methylat (5-step) : identifying dna n⁶-methyladenine sites in rice genome using continuous bag of nucleobases via chou's 5-step rule |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/150974 |
_version_ |
1707050441783640064 |