Visualization and sharing of genomics data via a cloud based system

The Final Year Project, Visualization and Sharing of Genomics Data via a Cloud Based System, documented on the relationships between Cloud Computing, Next-Generation-Sequencing (NGS), Galaxy, Integrated Genome Browser (IGB) and UCSC Genome Browser. Due to the vast amount of Genomics data involved in...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Guohao
Other Authors: Zheng Jie
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62704
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The Final Year Project, Visualization and Sharing of Genomics Data via a Cloud Based System, documented on the relationships between Cloud Computing, Next-Generation-Sequencing (NGS), Galaxy, Integrated Genome Browser (IGB) and UCSC Genome Browser. Due to the vast amount of Genomics data involved in the renowned technology Next-Generation-Sequencing (NGS), Galaxy (An open source, web-based platform for data intensive biomedical research) adopted Cloud Computing as a potential methodology to remedy the storage, processing and sharing of data. A detailed guide from depositing data, installing of Galaxy to the hosting of Galaxy were included in this report with proper configurations and recommendations attached. It is important to note that Galaxy no longer supported the distribution of Windows platform and thus, Ubuntu (A community developed, GNU/Linux based Free/Open Source operating system) was adopted as a substitution for development in a Linux platform. Development on Galaxy was also made possible by leveraging on the API key generated by Galaxy where users could perform analysis on a Terminal instead. Galaxy was further migrated to existing Cloud infrastructure of Nanyang Technological University, School of Computer Engineering where users were able to take advantage of its high availability, performance capability and the privilege of enjoying scalability in the computing resources. Benchmarking was performed on a single workstation together with NTU-SCE Cloud services and the result shows the latter outperformed the former significantly. External web applications like UCSC Genome Browser and Integrated Genome Browser (IGB) were also introduced to enhanced users’ experience in performing data analysis. A total of three recommendations each for hosting Galaxy on the Cloud concluded that the trade-off for performance and availability comes with great financial cost.