Reconstruction of encoded data in DNA storage technology

Inevitable biomolecular errors in DNA storage technology could be resolved by designing robust error correction codes or intelligent clustering/decoding algorithms. The first objective of our work is to reconstruct the encoded DNA sequences read from the Illumina sequencer before decoding by studyi...

Full description

Saved in:

Bibliographic Details
Main Author:	Subhasiny, Sankar
Other Authors:	Erry Gunawan
Format:	Thesis-Master by Research
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Electrical and electronic engineering::Applications of electronics
Online Access:	https://hdl.handle.net/10356/156818
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-156818
record_format	dspace
spelling	sg-ntu-dr.10356-1568182023-07-04T17:48:11Z Reconstruction of encoded data in DNA storage technology Subhasiny, Sankar Erry Gunawan School of Electrical and Electronic Engineering EGUNAWAN@ntu.edu.sg Engineering::Electrical and electronic engineering::Applications of electronics Inevitable biomolecular errors in DNA storage technology could be resolved by designing robust error correction codes or intelligent clustering/decoding algorithms. The first objective of our work is to reconstruct the encoded DNA sequences read from the Illumina sequencer before decoding by studying the efficiencies of the existing clustering tools in the biological domain and then modifying, tuning, and analyzing their applicability in the DNA data storage domain. The investigated tools and algorithms include Starcode, Cooperative Sequence Clustering, Majority nucleotide selection algorithm, Slidesort, and MeShClust. We observed and compared them, Starcode, Majority nucleotide selection algorithm and Cooperative Sequence Clustering yields the highest recovery rate with less sequencing redundancy for three datasets. The benefit of portability using nanopore-based storage leads to the second objective of designing a Nanopore based DNA storage simulator that can serve as a tool for evaluating coding and clustering techniques. We simulated the DNA channel and subsampling of sequenced data using the non-parametric subsampling method by studying the distribution of real nanopore DNA storage data and then integrated it with DeepSimulator. The design is evaluated for its accuracy by comparing it with real nanopore reads. Besides, nanopore reads obtained from the designed simulator are clustered and representatives in each cluster are extracted for reconstructing the encoded data. Master of Engineering 2022-04-26T01:51:52Z 2022-04-26T01:51:52Z 2022 Thesis-Master by Research Subhasiny, S. (2022). Reconstruction of encoded data in DNA storage technology. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156818 https://hdl.handle.net/10356/156818 10.32657/10356/156818 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Applications of electronics
spellingShingle	Engineering::Electrical and electronic engineering::Applications of electronics Subhasiny, Sankar Reconstruction of encoded data in DNA storage technology
description	Inevitable biomolecular errors in DNA storage technology could be resolved by designing robust error correction codes or intelligent clustering/decoding algorithms. The first objective of our work is to reconstruct the encoded DNA sequences read from the Illumina sequencer before decoding by studying the efficiencies of the existing clustering tools in the biological domain and then modifying, tuning, and analyzing their applicability in the DNA data storage domain. The investigated tools and algorithms include Starcode, Cooperative Sequence Clustering, Majority nucleotide selection algorithm, Slidesort, and MeShClust. We observed and compared them, Starcode, Majority nucleotide selection algorithm and Cooperative Sequence Clustering yields the highest recovery rate with less sequencing redundancy for three datasets. The benefit of portability using nanopore-based storage leads to the second objective of designing a Nanopore based DNA storage simulator that can serve as a tool for evaluating coding and clustering techniques. We simulated the DNA channel and subsampling of sequenced data using the non-parametric subsampling method by studying the distribution of real nanopore DNA storage data and then integrated it with DeepSimulator. The design is evaluated for its accuracy by comparing it with real nanopore reads. Besides, nanopore reads obtained from the designed simulator are clustered and representatives in each cluster are extracted for reconstructing the encoded data.
author2	Erry Gunawan
author_facet	Erry Gunawan Subhasiny, Sankar
format	Thesis-Master by Research
author	Subhasiny, Sankar
author_sort	Subhasiny, Sankar
title	Reconstruction of encoded data in DNA storage technology
title_short	Reconstruction of encoded data in DNA storage technology
title_full	Reconstruction of encoded data in DNA storage technology
title_fullStr	Reconstruction of encoded data in DNA storage technology
title_full_unstemmed	Reconstruction of encoded data in DNA storage technology
title_sort	reconstruction of encoded data in dna storage technology
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/156818
_version_	1772826684631810048

Reconstruction of encoded data in DNA storage technology

Similar Items