i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes

© 2020 The Authors N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computat...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Mehedi Hasan, Balachandran Manavalan, Watshara Shoombuatong, Mst Shamima Khatun, Hiroyuki Kurata
Other Authors: Kyushu Institute of Technology
Format: Article
Published: 2020
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/54491
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.54491
record_format dspace
spelling th-mahidol.544912020-05-05T12:18:22Z i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes Md Mehedi Hasan Balachandran Manavalan Watshara Shoombuatong Mst Shamima Khatun Hiroyuki Kurata Kyushu Institute of Technology Ajou University, School of Medicine Japan Society for the Promotion of Science Mahidol University Biochemistry, Genetics and Molecular Biology Computer Science © 2020 The Authors N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/. 2020-05-05T05:08:02Z 2020-05-05T05:08:02Z 2020-01-01 Article Computational and Structural Biotechnology Journal. Vol.18, (2020), 906-912 10.1016/j.csbj.2020.04.001 20010370 2-s2.0-85083319627 https://repository.li.mahidol.ac.th/handle/123456789/54491 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85083319627&origin=inward
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Biochemistry, Genetics and Molecular Biology
Computer Science
spellingShingle Biochemistry, Genetics and Molecular Biology
Computer Science
Md Mehedi Hasan
Balachandran Manavalan
Watshara Shoombuatong
Mst Shamima Khatun
Hiroyuki Kurata
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
description © 2020 The Authors N4-methylcytosine (4mC) is one of the most important DNA modifications and involved in regulating cell differentiations and gene expressions. The accurate identification of 4mC sites is necessary to understand various biological functions. In this work, we developed a new computational predictor called i4mC-Mouse to identify 4mC sites in the mouse genome. Herein, six encoding schemes of k-space nucleotide composition (KSNC), k-mer nucleotide composition (Kmer), mono nucleotide binary encoding (MBE), dinucleotide binary encoding, electron–ion interaction pseudo potentials (EIIP) and dinucleotide physicochemical composition were explored that cover different characteristics of DNA sequence information. Subsequently, we built six RF-based encoding models and then linearly combined their probability scores to construct the final predictor. Among the six RF-based models, the Kmer, KSNC, MBE, and EIIP encodings are sufficient, which contributed to 10%, 45%, 25%, and 20% of the prediction performance, respectively. On the independent test the i4mC-Mouse predicted the 4mC sites with accuracy and MCC of 0.816 and 0.633, respectively, which were approximately 2.5% and 5% higher than those of the existing method (4mCpred-EL). For experimental biologists, a freely available web application was implemented at http://kurata14.bio.kyutech.ac.jp/i4mC-Mouse/.
author2 Kyushu Institute of Technology
author_facet Kyushu Institute of Technology
Md Mehedi Hasan
Balachandran Manavalan
Watshara Shoombuatong
Mst Shamima Khatun
Hiroyuki Kurata
format Article
author Md Mehedi Hasan
Balachandran Manavalan
Watshara Shoombuatong
Mst Shamima Khatun
Hiroyuki Kurata
author_sort Md Mehedi Hasan
title i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_short i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_full i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_fullStr i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_full_unstemmed i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
title_sort i4mc-mouse: improved identification of dna n4-methylcytosine sites in the mouse genome using multiple encoding schemes
publishDate 2020
url https://repository.li.mahidol.ac.th/handle/123456789/54491
_version_ 1763496528410836992