i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
© 2020 The Authors N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computat...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
2020
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/54491 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
id |
th-mahidol.54491 |
---|---|
record_format |
dspace |
spelling |
th-mahidol.544912020-05-05T12:18:22Z i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes Md Mehedi Hasan Balachandran Manavalan Watshara Shoombuatong Mst Shamima Khatun Hiroyuki Kurata Kyushu Institute of Technology Ajou University, School of Medicine Japan Society for the Promotion of Science Mahidol University Biochemistry, Genetics and Molecular Biology Computer Science © 2020 The Authors N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/. 2020-05-05T05:08:02Z 2020-05-05T05:08:02Z 2020-01-01 Article Computational and Structural Biotechnology Journal. Vol.18, (2020), 906-912 10.1016/j.csbj.2020.04.001 20010370 2-s2.0-85083319627 https://repository.li.mahidol.ac.th/handle/123456789/54491 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85083319627&origin=inward |
institution |
Mahidol University |
building |
Mahidol University Library |
continent |
Asia |
country |
Thailand Thailand |
content_provider |
Mahidol University Library |
collection |
Mahidol University Institutional Repository |
topic |
Biochemistry, Genetics and Molecular Biology Computer Science |
spellingShingle |
Biochemistry, Genetics and Molecular Biology Computer Science Md Mehedi Hasan Balachandran Manavalan Watshara Shoombuatong Mst Shamima Khatun Hiroyuki Kurata i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes |
description |
© 2020 The Authors N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/. |
author2 |
Kyushu Institute of Technology |
author_facet |
Kyushu Institute of Technology Md Mehedi Hasan Balachandran Manavalan Watshara Shoombuatong Mst Shamima Khatun Hiroyuki Kurata |
format |
Article |
author |
Md Mehedi Hasan Balachandran Manavalan Watshara Shoombuatong Mst Shamima Khatun Hiroyuki Kurata |
author_sort |
Md Mehedi Hasan |
title |
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes |
title_short |
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes |
title_full |
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes |
title_fullStr |
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes |
title_full_unstemmed |
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes |
title_sort |
i4mc-mouse: improved identification of dna n4-methylcytosine sites in the mouse genome using multiple encoding schemes |
publishDate |
2020 |
url |
https://repository.li.mahidol.ac.th/handle/123456789/54491 |
_version_ |
1763496528410836992 |