Building a database of cancer genomics data

Cancer is known to be a genetic disease. Due to the inherent complexity of cancer, large-scale genomics datasets are therefore required to study the diseases. The purpose of this project, is to download publically available data about cancer (including DNA sequences, gene expression, and clinical tr...

Full description

Saved in:
Bibliographic Details
Main Author: Chern, Shan Ni
Other Authors: Zheng Jie
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62848
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-62848
record_format dspace
spelling sg-ntu-dr.10356-628482023-03-03T20:32:09Z Building a database of cancer genomics data Chern, Shan Ni Zheng Jie School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems Cancer is known to be a genetic disease. Due to the inherent complexity of cancer, large-scale genomics datasets are therefore required to study the diseases. The purpose of this project, is to download publically available data about cancer (including DNA sequences, gene expression, and clinical trial data) from TCGA (The Cancer Genome Atlas), RNASeqV2 is used to determine the gene expression level, and integrate them into a NoSQL database. The motivation for this project was to provide a centralized database to speed up and increase the accuracy of cancer research or data analysis with the ability to use large-scale genomics datasets. Programming interfaces was developed to ease the access and maintenance of the database. Along with the interfaces, different functions were developed for the database administrator and system user. Database administrator can perform insertion, edition, and deletion of datasets to the database. Insertion of datasets will allow the administrator to select a desired file to upload. Edition and deletion of the data will allow the administrator to select or search a particular record that was in the database. System user can perform query such as search for a particular gene’s information and average scaled estimate for a particular gene. The development tasks of this project follow closely to the iterative Software Development Lifecycle (SDLC) which will be presented in this report. With reference to the developed database, further development and enhancement can be made in order to better facilitate the application users. The future development work can include allowing the database to draw out the relationship between the genes found in different cancer with the building of a larger database. Bachelor of Engineering (Computer Science) 2015-04-30T02:38:38Z 2015-04-30T02:38:38Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/62848 en Nanyang Technological University 62 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems
Chern, Shan Ni
Building a database of cancer genomics data
description Cancer is known to be a genetic disease. Due to the inherent complexity of cancer, large-scale genomics datasets are therefore required to study the diseases. The purpose of this project, is to download publically available data about cancer (including DNA sequences, gene expression, and clinical trial data) from TCGA (The Cancer Genome Atlas), RNASeqV2 is used to determine the gene expression level, and integrate them into a NoSQL database. The motivation for this project was to provide a centralized database to speed up and increase the accuracy of cancer research or data analysis with the ability to use large-scale genomics datasets. Programming interfaces was developed to ease the access and maintenance of the database. Along with the interfaces, different functions were developed for the database administrator and system user. Database administrator can perform insertion, edition, and deletion of datasets to the database. Insertion of datasets will allow the administrator to select a desired file to upload. Edition and deletion of the data will allow the administrator to select or search a particular record that was in the database. System user can perform query such as search for a particular gene’s information and average scaled estimate for a particular gene. The development tasks of this project follow closely to the iterative Software Development Lifecycle (SDLC) which will be presented in this report. With reference to the developed database, further development and enhancement can be made in order to better facilitate the application users. The future development work can include allowing the database to draw out the relationship between the genes found in different cancer with the building of a larger database.
author2 Zheng Jie
author_facet Zheng Jie
Chern, Shan Ni
format Final Year Project
author Chern, Shan Ni
author_sort Chern, Shan Ni
title Building a database of cancer genomics data
title_short Building a database of cancer genomics data
title_full Building a database of cancer genomics data
title_fullStr Building a database of cancer genomics data
title_full_unstemmed Building a database of cancer genomics data
title_sort building a database of cancer genomics data
publishDate 2015
url http://hdl.handle.net/10356/62848
_version_ 1759853762172682240