On the design of capacity-approaching error-correction codes for multi constrained systems

Current common storage media has limited ability to store data with present data explosion trends, which serves as a dominant motivator for developing novel storage technologies. The technological advancement in biological sciences is not a new story, and DNA data storage is a beneficiary of breakth...

全面介紹

Saved in:
書目詳細資料
主要作者: Zhang, Jiayu
其他作者: Erry Gunawan
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2021
主題:
在線閱讀:https://hdl.handle.net/10356/151921
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Current common storage media has limited ability to store data with present data explosion trends, which serves as a dominant motivator for developing novel storage technologies. The technological advancement in biological sciences is not a new story, and DNA data storage is a beneficiary of breakthroughs in bioinformatics and in- novations by cross-disciplinary collaborations. Due to its potential to store data for centuries in a high-density manner, DNA is considered as a promising data storage solution to enormous data generation and storage requirement. DNA Sequencing is part of DNA data storage process, which is error prone. To analyse DNA nucleotide sequences, clustering plays a vital role to reduce redundancies and correct errors. Greedy approaches, which do not always produce the optimal results, are applied by most currently available software tools when clustering se- quences - they are very sensitive to single parameter which decides the similarities among DNA sequences within one cluster. In general, the specific similarity is not known, so sequence clusters generated by these greedy algorithms tend not to match the actual clusters if an imperfect parameter is used. As an unsupervised learning model, mean shift algorithm has been utilised many times in several fields like descriptive statistics, audio processing, and computer vision. A convergence to local optimum is guaranteed by the mean shift algorithm, which overcomes the limitations in greedy algorithms. MeShClust is an alignment-free clustering tool applying the mean shift approach and a machine learning algorithm to cluster DNA sequences. In this project, the MeShClust tool is implemented and the results are compared with the ones produced by the SlideSort algorithm based on the same DNA sequence dataset.