iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule

DNA N⁶-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N⁶-methyladenine in the rice genome, higher than in most of the other s...

Full description

Saved in:
Bibliographic Details
Main Author: Le, Nguyen Quoc Khanh
Other Authors: School of Humanities
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/150974
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-150974
record_format dspace
spelling sg-ntu-dr.10356-1509742021-07-29T12:47:16Z iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule Le, Nguyen Quoc Khanh School of Humanities Humanities::General Skip Gram Continuous Bag of Words DNA N⁶-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N⁶-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N⁶-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences. The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. 2021-07-29T12:47:15Z 2021-07-29T12:47:15Z 2019 Journal Article Le, N. Q. K. (2019). iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule. Molecular Genetics and Genomics, 294(5), 1173-1182. https://dx.doi.org/10.1007/s00438-019-01570-y 1617-4615 0000-0003-4896-7926 https://hdl.handle.net/10356/150974 10.1007/s00438-019-01570-y 31055655 2-s2.0-85065388389 5 294 1173 1182 en Molecular Genetics and Genomics © 2019 Springer-Verlag GmbH Germany, part of Springer Nature. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Humanities::General
Skip Gram
Continuous Bag of Words
spellingShingle Humanities::General
Skip Gram
Continuous Bag of Words
Le, Nguyen Quoc Khanh
iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
description DNA N⁶-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N⁶-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N⁶-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences.
author2 School of Humanities
author_facet School of Humanities
Le, Nguyen Quoc Khanh
format Article
author Le, Nguyen Quoc Khanh
author_sort Le, Nguyen Quoc Khanh
title iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
title_short iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
title_full iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
title_fullStr iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
title_full_unstemmed iN6-methylat (5-step) : identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
title_sort in6-methylat (5-step) : identifying dna n⁶-methyladenine sites in rice genome using continuous bag of nucleobases via chou's 5-step rule
publishDate 2021
url https://hdl.handle.net/10356/150974
_version_ 1707050441783640064