On the design of capacity-approaching error-correction codes for multi constrained systems

Current common storage media has limited ability to store data with present data explosion trends, which serves as a dominant motivator for developing novel storage technologies. The technological advancement in biological sciences is not a new story, and DNA data storage is a beneficiary of breakth...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Jiayu
Other Authors: Erry Gunawan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/151921
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-151921
record_format dspace
spelling sg-ntu-dr.10356-1519212023-07-07T17:58:39Z On the design of capacity-approaching error-correction codes for multi constrained systems Zhang, Jiayu Erry Gunawan School of Electrical and Electronic Engineering EGUNAWAN@ntu.edu.sg Engineering::Computer science and engineering::Data::Data storage representations Current common storage media has limited ability to store data with present data explosion trends, which serves as a dominant motivator for developing novel storage technologies. The technological advancement in biological sciences is not a new story, and DNA data storage is a beneficiary of breakthroughs in bioinformatics and in- novations by cross-disciplinary collaborations. Due to its potential to store data for centuries in a high-density manner, DNA is considered as a promising data storage solution to enormous data generation and storage requirement. DNA Sequencing is part of DNA data storage process, which is error prone. To analyse DNA nucleotide sequences, clustering plays a vital role to reduce redundancies and correct errors. Greedy approaches, which do not always produce the optimal results, are applied by most currently available software tools when clustering se- quences - they are very sensitive to single parameter which decides the similarities among DNA sequences within one cluster. In general, the specific similarity is not known, so sequence clusters generated by these greedy algorithms tend not to match the actual clusters if an imperfect parameter is used. As an unsupervised learning model, mean shift algorithm has been utilised many times in several fields like descriptive statistics, audio processing, and computer vision. A convergence to local optimum is guaranteed by the mean shift algorithm, which overcomes the limitations in greedy algorithms. MeShClust is an alignment-free clustering tool applying the mean shift approach and a machine learning algorithm to cluster DNA sequences. In this project, the MeShClust tool is implemented and the results are compared with the ones produced by the SlideSort algorithm based on the same DNA sequence dataset. Bachelor of Engineering (Information Engineering and Media) 2021-07-08T00:39:28Z 2021-07-08T00:39:28Z 2021 Final Year Project (FYP) Zhang, J. (2021). On the design of capacity-approaching error-correction codes for multi constrained systems. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/151921 https://hdl.handle.net/10356/151921 en A3078-201 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Data::Data storage representations
spellingShingle Engineering::Computer science and engineering::Data::Data storage representations
Zhang, Jiayu
On the design of capacity-approaching error-correction codes for multi constrained systems
description Current common storage media has limited ability to store data with present data explosion trends, which serves as a dominant motivator for developing novel storage technologies. The technological advancement in biological sciences is not a new story, and DNA data storage is a beneficiary of breakthroughs in bioinformatics and in- novations by cross-disciplinary collaborations. Due to its potential to store data for centuries in a high-density manner, DNA is considered as a promising data storage solution to enormous data generation and storage requirement. DNA Sequencing is part of DNA data storage process, which is error prone. To analyse DNA nucleotide sequences, clustering plays a vital role to reduce redundancies and correct errors. Greedy approaches, which do not always produce the optimal results, are applied by most currently available software tools when clustering se- quences - they are very sensitive to single parameter which decides the similarities among DNA sequences within one cluster. In general, the specific similarity is not known, so sequence clusters generated by these greedy algorithms tend not to match the actual clusters if an imperfect parameter is used. As an unsupervised learning model, mean shift algorithm has been utilised many times in several fields like descriptive statistics, audio processing, and computer vision. A convergence to local optimum is guaranteed by the mean shift algorithm, which overcomes the limitations in greedy algorithms. MeShClust is an alignment-free clustering tool applying the mean shift approach and a machine learning algorithm to cluster DNA sequences. In this project, the MeShClust tool is implemented and the results are compared with the ones produced by the SlideSort algorithm based on the same DNA sequence dataset.
author2 Erry Gunawan
author_facet Erry Gunawan
Zhang, Jiayu
format Final Year Project
author Zhang, Jiayu
author_sort Zhang, Jiayu
title On the design of capacity-approaching error-correction codes for multi constrained systems
title_short On the design of capacity-approaching error-correction codes for multi constrained systems
title_full On the design of capacity-approaching error-correction codes for multi constrained systems
title_fullStr On the design of capacity-approaching error-correction codes for multi constrained systems
title_full_unstemmed On the design of capacity-approaching error-correction codes for multi constrained systems
title_sort on the design of capacity-approaching error-correction codes for multi constrained systems
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/151921
_version_ 1772829021602578432