Computational analysis and prediction of specific genomic regions forming R-loop structure and chromosomal variations associated with cancer

An R-loop is a structure formed co-transcriptionally between a nascent RNA and its template DNA strand, leaving the non-template DNA strand unpaired. I hypothesized that R-loops could form in many genes in mammalians, associate with transcription and genetic instability. I developed a quantitative m...

Full description

Saved in:
Bibliographic Details
Main Author: Wongsurawat, Thidathip
Other Authors: Vladimir Kuznetsov
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/62554
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:An R-loop is a structure formed co-transcriptionally between a nascent RNA and its template DNA strand, leaving the non-template DNA strand unpaired. I hypothesized that R-loops could form in many genes in mammalians, associate with transcription and genetic instability. I developed a quantitative model of R-loop forming sequences (QmRLFSs) and bioinformatics tools to predict RLFSs in human and mouse genomes. I collected these RLFSs from throughout the genome into R-loopDB, a database of predicted R-loops (http://rloop.bii.a-star.edu.sg/). Most (60%) of human and mouse genes contain RLFSs, and 11,773 evolutionarily conserved RLFSs map to 7,630 protein-coding genes and 117 ncRNA genes. Validation using experimental data showed that the model predicts RLFSs with a high agreement. Integrative genomics analyses suggested that RLFSs could play a role in gene regulation, AID/APOBEC-mediated editing/mutagenesis, alternative splicing, and epigenetic modifications, and also associate with mutations in cancer, neurodegenerative diseases and mental disorders. Therefore, RLFSs represent novel therapeutic targets. Comparison of three RLFS prediction models demonstrates that QmRLFS would be a promising approach for researchers interested in identifying RLFSs for both small and large-scale data.