Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes

Background: Detection of DNA copy number alterations (CNAs) is critical to understand genetic diversity, genome evolution and pathological conditions such as cancer. Cancer genomes are plagued with widespread multi-level structural aberrations of chromosomes that pose challenges to discover CNAs of...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ahmed Ibrahim Samir Khalil, Khyriem, Costerwell, Chattopadhyay, Anupam, Sanyal, Amartya
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2021
Subjects:	Engineering::Computer science and engineering Cancer DNA Copy Number Alteration
Online Access:	https://hdl.handle.net/10356/146948
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-146948
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Cancer DNA Copy Number Alteration
spellingShingle	Engineering::Computer science and engineering Cancer DNA Copy Number Alteration Ahmed Ibrahim Samir Khalil Khyriem, Costerwell Chattopadhyay, Anupam Sanyal, Amartya Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
description	Background: Detection of DNA copy number alterations (CNAs) is critical to understand genetic diversity, genome evolution and pathological conditions such as cancer. Cancer genomes are plagued with widespread multi-level structural aberrations of chromosomes that pose challenges to discover CNAs of different length scales, and distinct biological origins and functions. Although several computational tools are available to identify CNAs using read depth (RD) signal, they fail to distinguish between large-scale and focal alterations due to inaccurate modeling of the RD signal of cancer genomes. Additionally, RD signal is affected by overdispersion-driven biases at low coverage, which significantly inflate false detection of CNA regions. Results: We have developed CNAtra framework to hierarchically discover and classify ‘large-scale’ and ‘focal’ copy number gain/loss from a single whole-genome sequencing (WGS) sample. CNAtra first utilizes a multimodal-based distribution to estimate the copy number (CN) reference from the complex RD profile of the cancer genome. We implemented Savitzky-Golay smoothing filter and Modified Varri segmentation to capture the change points of the RD signal. We then developed a CN state-driven merging algorithm to identify the large segments with distinct copy numbers. Next, we identified focal alterations in each large segment using coverage-based thresholding to mitigate the adverse effects of signal variations. Using cancer cell lines and patient datasets, we confirmed CNAtra’s ability to detect and distinguish the segmental aneuploidies and focal alterations. We used realistic simulated data for benchmarking the performance of CNAtra against other single-sample detection tools, where we artificially introduced CNAs in the original cancer profiles. We found that CNAtra is superior in terms of precision, recall and f-measure. CNAtra shows the highest sensitivity of 93 and 97% for detecting large-scale and focal alterations respectively. Visual inspection of CNAs revealed that CNAtra is the most robust detection tool for low-coverage cancer data. Conclusions: CNAtra is a single-sample CNA detection tool that provides an analytical and visualization framework for CNA profiling without relying on any reference control. It can detect chromosome-level segmental aneuploidies and high-confidence focal alterations, even from low-coverage data. CNAtra is an open-source software implemented in MATLAB®. It is freely available at https://github.com/AISKhalil/CNAtra.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Ahmed Ibrahim Samir Khalil Khyriem, Costerwell Chattopadhyay, Anupam Sanyal, Amartya
format	Article
author	Ahmed Ibrahim Samir Khalil Khyriem, Costerwell Chattopadhyay, Anupam Sanyal, Amartya
author_sort	Ahmed Ibrahim Samir Khalil
title	Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
title_short	Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
title_full	Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
title_fullStr	Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
title_full_unstemmed	Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
title_sort	hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
publishDate	2021
url	https://hdl.handle.net/10356/146948
_version_	1695706186201956352
spelling	sg-ntu-dr.10356-1469482021-03-15T06:22:20Z Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes Ahmed Ibrahim Samir Khalil Khyriem, Costerwell Chattopadhyay, Anupam Sanyal, Amartya School of Computer Science and Engineering School of Biological Sciences Engineering::Computer science and engineering Cancer DNA Copy Number Alteration Background: Detection of DNA copy number alterations (CNAs) is critical to understand genetic diversity, genome evolution and pathological conditions such as cancer. Cancer genomes are plagued with widespread multi-level structural aberrations of chromosomes that pose challenges to discover CNAs of different length scales, and distinct biological origins and functions. Although several computational tools are available to identify CNAs using read depth (RD) signal, they fail to distinguish between large-scale and focal alterations due to inaccurate modeling of the RD signal of cancer genomes. Additionally, RD signal is affected by overdispersion-driven biases at low coverage, which significantly inflate false detection of CNA regions. Results: We have developed CNAtra framework to hierarchically discover and classify ‘large-scale’ and ‘focal’ copy number gain/loss from a single whole-genome sequencing (WGS) sample. CNAtra first utilizes a multimodal-based distribution to estimate the copy number (CN) reference from the complex RD profile of the cancer genome. We implemented Savitzky-Golay smoothing filter and Modified Varri segmentation to capture the change points of the RD signal. We then developed a CN state-driven merging algorithm to identify the large segments with distinct copy numbers. Next, we identified focal alterations in each large segment using coverage-based thresholding to mitigate the adverse effects of signal variations. Using cancer cell lines and patient datasets, we confirmed CNAtra’s ability to detect and distinguish the segmental aneuploidies and focal alterations. We used realistic simulated data for benchmarking the performance of CNAtra against other single-sample detection tools, where we artificially introduced CNAs in the original cancer profiles. We found that CNAtra is superior in terms of precision, recall and f-measure. CNAtra shows the highest sensitivity of 93 and 97% for detecting large-scale and focal alterations respectively. Visual inspection of CNAs revealed that CNAtra is the most robust detection tool for low-coverage cancer data. Conclusions: CNAtra is a single-sample CNA detection tool that provides an analytical and visualization framework for CNA profiling without relying on any reference control. It can detect chromosome-level segmental aneuploidies and high-confidence focal alterations, even from low-coverage data. CNAtra is an open-source software implemented in MATLAB®. It is freely available at https://github.com/AISKhalil/CNAtra. Ministry of Education (MOE) Published version This work was supported by Nanyang Technological University’s Nanyang Assistant Professorship grant and Singapore Ministry of Education Academic Research Fund Tier 1 grants (RG46/16 and RG39/18) to AS. AC was supported by Nanyang Technological University start-up grant. The funding bodies were not involved in the design of the study, and collection, analysis, and interpretation of data, and in writing the manuscript. 2021-03-15T06:22:20Z 2021-03-15T06:22:20Z 2020 Journal Article Ahmed Ibrahim Samir Khalil, Khyriem, C., Chattopadhyay, A. & Sanyal, A. (2020). Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes. BMC Bioinformatics, 21. https://dx.doi.org/10.1186/s12859-020-3480-3 1471-2105 https://hdl.handle.net/10356/146948 10.1186/s12859-020-3480-3 32299346 21 en RG46/16 RG39/18 BMC Bioinformatics © 2020 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. application/pdf

Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes

Similar Items