DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR- associated (Cas) system is a popular and easy to use gene-editing technique, but it has off-target risk. Cutting the off-target sites will harm the cells severely, hence in silico methods are needed to help to avoid this. Mos...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Yu, Long, Yahui, Yin, Rui, Kwoh, Chee Keong
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/145675
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-145675
record_format dspace
spelling sg-ntu-dr.10356-1456752021-01-04T08:05:00Z DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation Zhang, Yu Long, Yahui Yin, Rui Kwoh, Chee Keong School of Computer Science and Engineering Engineering::Computer science and engineering CRISPR/Cas9 Data Augmentation Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR- associated (Cas) system is a popular and easy to use gene-editing technique, but it has off-target risk. Cutting the off-target sites will harm the cells severely, hence in silico methods are needed to help to avoid this. Most existing in silico approaches mainly relied on a relatively small positive dataset and the data imbalance issue still exists. Besides, some samples used to be considered as negative are later proved to be positive. Hence, it is essential to refresh the dataset and develop more accurate off-target activity prediction programs. In this work, firstly, we extended the current positive dataset and explored the potential differences between positive and negative data based on the new dataset. Then we adopted a new data augmentation method to solve the data imbalance issue, and used the ensemble idea to take more negative data into consideration to make the model close to the real scenario, but at the same time keeping the model balance. Finally, we developed DL-CRISPR, a deep learning framework to predict off-target activity in CRISPR/Cas9. DL-CRISPR is evaluated and compared with other state-of-the-art methods on three kinds of datasets: 5-fold cross validation test datasets, putative off-targets datasets related to specific single guide RNAs (sgRNAs), and putative off-targets datasets related to unseen sgRNAs. DL-CRISPR realizes the best average accuracy, i.e. 98.57%, on 5-fold cross validation datasets and correctly detects more off-targets on datasets related to both seen and unseen sgRNAs. Agency for Science, Technology and Research (A*STAR) Published version This work was supported by the A∗STAR-NTU-SUTD AI Partnership under Project RGANS1905. 2021-01-04T08:05:00Z 2021-01-04T08:05:00Z 2020 Journal Article Zhang, Y., Long, Y., Yin, R., & Kwoh, C. K. (2020). DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation. IEEE Access, 8, 76610-76617. doi:10.1109/access.2020.2989454 2169-3536 https://hdl.handle.net/10356/145675 10.1109/ACCESS.2020.2989454 8 76610 76617 en RGANS1905 IEEE Access © 2020 IEEE. This journal is 100% open access, which means that all content is freely available without charge to users or their institutions. All articles accepted after 12 June 2019 are published under a CC BY 4.0 license, and the author retains copyright. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, as long as proper attribution is given. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
CRISPR/Cas9
Data Augmentation
spellingShingle Engineering::Computer science and engineering
CRISPR/Cas9
Data Augmentation
Zhang, Yu
Long, Yahui
Yin, Rui
Kwoh, Chee Keong
DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
description Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR- associated (Cas) system is a popular and easy to use gene-editing technique, but it has off-target risk. Cutting the off-target sites will harm the cells severely, hence in silico methods are needed to help to avoid this. Most existing in silico approaches mainly relied on a relatively small positive dataset and the data imbalance issue still exists. Besides, some samples used to be considered as negative are later proved to be positive. Hence, it is essential to refresh the dataset and develop more accurate off-target activity prediction programs. In this work, firstly, we extended the current positive dataset and explored the potential differences between positive and negative data based on the new dataset. Then we adopted a new data augmentation method to solve the data imbalance issue, and used the ensemble idea to take more negative data into consideration to make the model close to the real scenario, but at the same time keeping the model balance. Finally, we developed DL-CRISPR, a deep learning framework to predict off-target activity in CRISPR/Cas9. DL-CRISPR is evaluated and compared with other state-of-the-art methods on three kinds of datasets: 5-fold cross validation test datasets, putative off-targets datasets related to specific single guide RNAs (sgRNAs), and putative off-targets datasets related to unseen sgRNAs. DL-CRISPR realizes the best average accuracy, i.e. 98.57%, on 5-fold cross validation datasets and correctly detects more off-targets on datasets related to both seen and unseen sgRNAs.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zhang, Yu
Long, Yahui
Yin, Rui
Kwoh, Chee Keong
format Article
author Zhang, Yu
Long, Yahui
Yin, Rui
Kwoh, Chee Keong
author_sort Zhang, Yu
title DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
title_short DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
title_full DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
title_fullStr DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
title_full_unstemmed DL-CRISPR : a deep learning method for off-target activity prediction in CRISPR/Cas9 with data augmentation
title_sort dl-crispr : a deep learning method for off-target activity prediction in crispr/cas9 with data augmentation
publishDate 2021
url https://hdl.handle.net/10356/145675
_version_ 1688665434360905728