Deep learning based DNA:RNA triplex forming potential prediction
Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large num...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/145881 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-145881 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1458812021-01-13T05:21:41Z Deep learning based DNA:RNA triplex forming potential prediction Zhang, Yu Long, Yahui Kwoh, Chee Keong School of Computer Science and Engineering Science::Biological sciences Long Noncoding RNAs DNA:RNA Triplex Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions. Agency for Science, Technology and Research (A*STAR) Ministry of Education (MOE) Published version Publication costs are founded by A*STAR-NTU-SUTD Al Partnership [RGANS1905], Singapore Ministry of Education Academic Research Fund Tier 1 [2020-T1-001-130(RG15/20)], and Singapore Ministry of Education Academic Research Fund Tier 2 [MOE2019-T2-2-175]. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. 2021-01-13T05:21:41Z 2021-01-13T05:21:41Z 2020 Journal Article Zhang, Y., Long, Y., & Kwoh, C. K. (2020). Deep learning based DNA:RNA Triplex forming potential prediction. BMC Bioinformatics, 21(1), 522-. doi:10.1186/s12859-020-03864-0 1471-2105 0000-0002-8547-6387 https://hdl.handle.net/10356/145881 10.1186/s12859-020-03864-0 33183242 2-s2.0-85095931519 1 21 en RGANS1905 2020-T1-001-130(RG15/20) MOE2019-T2-2-175 BMC Bioinformatics © 2020 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Science::Biological sciences Long Noncoding RNAs DNA:RNA Triplex |
spellingShingle |
Science::Biological sciences Long Noncoding RNAs DNA:RNA Triplex Zhang, Yu Long, Yahui Kwoh, Chee Keong Deep learning based DNA:RNA triplex forming potential prediction |
description |
Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Zhang, Yu Long, Yahui Kwoh, Chee Keong |
format |
Article |
author |
Zhang, Yu Long, Yahui Kwoh, Chee Keong |
author_sort |
Zhang, Yu |
title |
Deep learning based DNA:RNA triplex forming potential prediction |
title_short |
Deep learning based DNA:RNA triplex forming potential prediction |
title_full |
Deep learning based DNA:RNA triplex forming potential prediction |
title_fullStr |
Deep learning based DNA:RNA triplex forming potential prediction |
title_full_unstemmed |
Deep learning based DNA:RNA triplex forming potential prediction |
title_sort |
deep learning based dna:rna triplex forming potential prediction |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/145881 |
_version_ |
1690658463881887744 |