An improved nearest neighbour algorithm for DNA sequence clustering
This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algori...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/140743 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-140743 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1407432023-02-28T23:13:46Z An improved nearest neighbour algorithm for DNA sequence clustering Chen, Guizhen Kiah Han Mao Xia Kelin School of Physical and Mathematical Sciences hmkiah@ntu.edu.sg, xiakelin@ntu.edu.sg Science::Mathematics This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algorithm to reduce the computational time. We analyse the time complexity of the UPGMA method and introduce our approach with the idea of ball tree. We perform the analysis on actual datasets including the filoviruses, influenza viruses and bacterial genomes. Generally, the new approach is able to separate the species well and produce better results than the original ball tree method. The computational time is also greatly reduced by the modified ball tree structure. Bachelor of Science in Mathematical Sciences 2020-06-01T15:29:25Z 2020-06-01T15:29:25Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140743 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Science::Mathematics |
spellingShingle |
Science::Mathematics Chen, Guizhen An improved nearest neighbour algorithm for DNA sequence clustering |
description |
This paper explores clustering algorithms to construct a phylogenetic tree, based on distance measures such as the 18-dimensional vector distance. The main objective of the study is to investigate the distance-based methods to construct a phylogenetic tree and design an efficient and accurate algorithm to reduce the computational time. We analyse the time complexity of the UPGMA method and introduce our approach with the idea of ball tree. We perform the analysis on actual datasets including the filoviruses, influenza viruses and bacterial genomes. Generally, the new approach is able to separate the species well and produce better results than the original ball tree method. The computational time is also greatly reduced by the modified ball tree structure. |
author2 |
Kiah Han Mao |
author_facet |
Kiah Han Mao Chen, Guizhen |
format |
Final Year Project |
author |
Chen, Guizhen |
author_sort |
Chen, Guizhen |
title |
An improved nearest neighbour algorithm for DNA sequence clustering |
title_short |
An improved nearest neighbour algorithm for DNA sequence clustering |
title_full |
An improved nearest neighbour algorithm for DNA sequence clustering |
title_fullStr |
An improved nearest neighbour algorithm for DNA sequence clustering |
title_full_unstemmed |
An improved nearest neighbour algorithm for DNA sequence clustering |
title_sort |
improved nearest neighbour algorithm for dna sequence clustering |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/140743 |
_version_ |
1759854792190984192 |