LdClusterView : a system for automated analysis and visualization of genomics data

In the study of genetics, researchers explore billions of deoxyribonucleic acid (DNA) bases to identify biologically interesting patterns. Due to the need to explore this voluminous data, bioinformatics scientists have developed genome browsers to provide researchers with a platform to better...

Full description

Saved in:
Bibliographic Details
Main Author: Salia, Sisi
Other Authors: Zheng Jie
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73950
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the study of genetics, researchers explore billions of deoxyribonucleic acid (DNA) bases to identify biologically interesting patterns. Due to the need to explore this voluminous data, bioinformatics scientists have developed genome browsers to provide researchers with a platform to better understand the data. Similarly, in this project, Singapore Immunity Network (SIgN) aimed to develop an interactive web-based visualizations platform for the researchers. The visualizations created were LdClusterView, an improvement to the current genome browsers and Biostatistical Network Tool (BNT), a tool to identify interest genes for further analysis. Most of the genome browsers visualized the relationships between different biological layers through multiple graphical plots stacked on top of each other with a common horizontal axis representing the chromosome length. However, it only shows spatial relationship between different biological data at various regions of the chromosome and does not depict the complex relationship between genetic variations. LdClusterView extended the basic layout of stacked plots by incorporating a dendrogram and Sankey plot to describe the relationship between the stacked plots. These improvements allowed illustration of both relationships between the plots and relationships between the internal elements of the plots respectively. However, due to the limitation of the web application to view a large amount of data, only one gene could be displayed at a time. Therefore, another web application tool, BNT, was created to complement LdClusterView. BNT explored an emerging method of associating gene information with other types of biological data by analysing the data through non-parametric tests, plots and sub-network graph in form of Minimum Spanning Tree (MST) to identify interesting gene candidates for further exploration in LdClusterView. Both applications were implemented through HTML, CSS, JavaScript and D3 library. They were both optimized to be easily used by the researchers to explore the data and to produce visualizations for reporting purposes.