Evaluation and improvement of error correction tools for erroneous metagenomic reads

Metagenomics and its processes have a great impact in biological advances. With the introduction of NGS technologies to produce high throughput sequencing, efficiency is achieved but not the quality of the sequences. Thus, error correction tools were introduced to improve the quality of sequences. H...

Full description

Saved in:
Bibliographic Details
Main Author: Ho, Guanlin
Other Authors: Kwoh Chee Keong
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/59048
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-59048
record_format dspace
spelling sg-ntu-dr.10356-590482023-03-03T20:43:14Z Evaluation and improvement of error correction tools for erroneous metagenomic reads Ho, Guanlin Kwoh Chee Keong School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity Metagenomics and its processes have a great impact in biological advances. With the introduction of NGS technologies to produce high throughput sequencing, efficiency is achieved but not the quality of the sequences. Thus, error correction tools were introduced to improve the quality of sequences. However, there are many error correction tools to choose from which use different kind of algorithms. On top of that, sequences produced from different NGS technologies are biased towards different error characteristics. Three error correction tools were selected for benchmarking in this project, namely Coral, CD-HIT and USEARCH. The error correction tools were used to correct simulated 454 Pyrosequencing and Illumina‘s Solexa reads. MapQ score was used to compare the performance of the quality of the corrected reads against the original genome. Coral, using exact k-mer clustering and correcting reads using multiple alignments, produced most accurate reads in correcting 454 Pyrosequencing reads while USEARCH performed the best in correcting Illumina’s reads using 3’ end trimming and discarding reads with high expected errors. With the results of the performance of the tools, the project continues to integrate USEARCH fast clustering method and Coral’s detailed individual read error correction method together. The integrated method is called Fast Clustering Detailed Correction, FCDC. FCDC reduced the percentage of low quality reads compared to USEARCH’s and Coral’s corrected reads. Future developments of this project include improving error correction method for Illumina’s reads. The benchmarking process can also be extended to other NGS technologies such as Ion Torrent and SOLiD. Bachelor of Engineering (Computer Science) 2014-04-22T01:45:45Z 2014-04-22T01:45:45Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59048 en Nanyang Technological University 60 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
spellingShingle DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
Ho, Guanlin
Evaluation and improvement of error correction tools for erroneous metagenomic reads
description Metagenomics and its processes have a great impact in biological advances. With the introduction of NGS technologies to produce high throughput sequencing, efficiency is achieved but not the quality of the sequences. Thus, error correction tools were introduced to improve the quality of sequences. However, there are many error correction tools to choose from which use different kind of algorithms. On top of that, sequences produced from different NGS technologies are biased towards different error characteristics. Three error correction tools were selected for benchmarking in this project, namely Coral, CD-HIT and USEARCH. The error correction tools were used to correct simulated 454 Pyrosequencing and Illumina‘s Solexa reads. MapQ score was used to compare the performance of the quality of the corrected reads against the original genome. Coral, using exact k-mer clustering and correcting reads using multiple alignments, produced most accurate reads in correcting 454 Pyrosequencing reads while USEARCH performed the best in correcting Illumina’s reads using 3’ end trimming and discarding reads with high expected errors. With the results of the performance of the tools, the project continues to integrate USEARCH fast clustering method and Coral’s detailed individual read error correction method together. The integrated method is called Fast Clustering Detailed Correction, FCDC. FCDC reduced the percentage of low quality reads compared to USEARCH’s and Coral’s corrected reads. Future developments of this project include improving error correction method for Illumina’s reads. The benchmarking process can also be extended to other NGS technologies such as Ion Torrent and SOLiD.
author2 Kwoh Chee Keong
author_facet Kwoh Chee Keong
Ho, Guanlin
format Final Year Project
author Ho, Guanlin
author_sort Ho, Guanlin
title Evaluation and improvement of error correction tools for erroneous metagenomic reads
title_short Evaluation and improvement of error correction tools for erroneous metagenomic reads
title_full Evaluation and improvement of error correction tools for erroneous metagenomic reads
title_fullStr Evaluation and improvement of error correction tools for erroneous metagenomic reads
title_full_unstemmed Evaluation and improvement of error correction tools for erroneous metagenomic reads
title_sort evaluation and improvement of error correction tools for erroneous metagenomic reads
publishDate 2014
url http://hdl.handle.net/10356/59048
_version_ 1759854694985891840