An improved nearest neighbour algorithm for DNA sequence clustering

This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algori...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Guizhen
Other Authors: Kiah Han Mao
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140743
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-140743
record_format dspace
spelling sg-ntu-dr.10356-1407432023-02-28T23:13:46Z An improved nearest neighbour algorithm for DNA sequence clustering Chen, Guizhen Kiah Han Mao Xia Kelin School of Physical and Mathematical Sciences hmkiah@ntu.edu.sg, xiakelin@ntu.edu.sg Science::Mathematics This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algorithm to reduce the computational time. We analyse the time complexity of the UPGMA method and introduce our approach with the idea of ball tree. We perform the analysis on actual datasets including the filoviruses, influenza viruses and bacterial genomes. Generally, the new approach is able to separate the species well and produce better results than the original ball tree method. The computational time is also greatly reduced by the modified ball tree structure. Bachelor of Science in Mathematical Sciences 2020-06-01T15:29:25Z 2020-06-01T15:29:25Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140743 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Mathematics
spellingShingle Science::Mathematics
Chen, Guizhen
An improved nearest neighbour algorithm for DNA sequence clustering
description This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algorithm to reduce the computational time. We analyse the time complexity of the UPGMA method and introduce our approach with the idea of ball tree. We perform the analysis on actual datasets including the filoviruses, influenza viruses and bacterial genomes. Generally, the new approach is able to separate the species well and produce better results than the original ball tree method. The computational time is also greatly reduced by the modified ball tree structure.
author2 Kiah Han Mao
author_facet Kiah Han Mao
Chen, Guizhen
format Final Year Project
author Chen, Guizhen
author_sort Chen, Guizhen
title An improved nearest neighbour algorithm for DNA sequence clustering
title_short An improved nearest neighbour algorithm for DNA sequence clustering
title_full An improved nearest neighbour algorithm for DNA sequence clustering
title_fullStr An improved nearest neighbour algorithm for DNA sequence clustering
title_full_unstemmed An improved nearest neighbour algorithm for DNA sequence clustering
title_sort improved nearest neighbour algorithm for dna sequence clustering
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/140743
_version_ 1759854792190984192