Coding for DNA data storage

DNA has recently become an attractive medium for long-term data archive due to its extremely high density (zettabytes per gram), durable preservation and extremely low power consumption. Previous works have designed and implemented several prototypes for this emerging data storage technique where da...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yixin
Other Authors: Erry Gunawan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155241
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:DNA has recently become an attractive medium for long-term data archive due to its extremely high density (zettabytes per gram), durable preservation and extremely low power consumption. Previous works have designed and implemented several prototypes for this emerging data storage technique where data was encoded, stored, and retrieved without errors, validating the feasibility of DNA data storage. However, current DNA storage systems still have limitations on several evaluable performance metrics including achieved information capacity, net information density, and scalability of random access while the storage channel remains partially uncovered. This research work aims to understand the characteristics of this storage channel and design efficient coding (encoding/decoding) algorithms to construct and implement DNA storage systems with effectivity, efficiency, and scalability. Specifically, error control codes are designed tailoring DNA storage scenario to provide error resilience against the inevitable errors occurring in the storage process, i.e., DNA synthesis, PCR amplification, sample preparation, storage and DNA sequencing. Besides, new constrained codes are proposed as a pre-processing coding technique to convert data into a proper format for further storage in DNA where several biomedical constraints are concerned because DNA strands satisfying these constraints are more stable and mutation-against. Overall, this work has comprehensively studied DNA data storage technology with focus on code, algorithm, and system design. Together, this work not only offers new design solutions to DNA storage by providing several highly performed code, algorithm and system design but also provides new angles towards data reconstruction strategies by investigating error characteristics of DNA storage channel. The results presented here are supposed to further advance DNA data storage to a more efficient and pragmatic technology.