An improved nearest neighbour algorithm for DNA sequence clustering

This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algori...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Guizhen
Other Authors: Kiah Han Mao
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140743
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algorithm to reduce the computational time. We analyse the time complexity of the UPGMA method and introduce our approach with the idea of ball tree. We perform the analysis on actual datasets including the filoviruses, influenza viruses and bacterial genomes. Generally, the new approach is able to separate the species well and produce better results than the original ball tree method. The computational time is also greatly reduced by the modified ball tree structure.