Quantitative data analysis

The 2014 Ebola epidemic is the largest in the history of the world and the first time in West Africa, affecting multiple countries in the region. It has the potential to rapidly spread worldwide and cause the biggest ever pandemic. We are interested in understanding the molecular evolution of this e...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Yu
Other Authors: Su Haibin
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/64856
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-64856
record_format dspace
spelling sg-ntu-dr.10356-648562023-02-28T23:16:46Z Quantitative data analysis Xu, Yu Su Haibin School of Physical and Mathematical Sciences DRNTU::Science::Physics The 2014 Ebola epidemic is the largest in the history of the world and the first time in West Africa, affecting multiple countries in the region. It has the potential to rapidly spread worldwide and cause the biggest ever pandemic. We are interested in understanding the molecular evolution of this emerging infectious disease that has already killed more than 8000 people in West Africa alone with a fatality rate of nearly 70% in certain regions. Ebola Virus (EBOV) is one of three members in Filoviridae virus family (The other two are Cuevavirus and Marburgvirus). Five strains of EBOV have been classified, including four African ones: Tai Forest (also known as Ivory Coast), Sudan, Zaire, and Bundibugyo, as well as one Philippines strain Reston. Reston strains are non-infectious to humans. Fatality rate for the Sudan and Bundibugyo strains is around 40% while for Zaire strain is about 90%. It has been identified that 2014 Western Africa outbreak is due to EBOV belongs to Zaire. Still no effectual vaccine has been produced. We have adapted the approach of sequence to structure at macro and micro level to identify the hotspots of evolution in the EBOV genome. The average genome size of EBOV is 18940. The ssRNA genome encodes seven proteins: NP, vp35, vp40, GP, vp30, vp24 and L. Of these seven proteins NP (Nuclear Protein) plays a critical role in virulence. Hence we focused on this NP gene to corroborate the molecular evolution results obtained with whole genomes. We have collected all the relevant genome and NP gene data from NCBI Virus Variation Resource Http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/Database/nph-select2.cgi?cmd=database&taxid=186536). We chose seven sample sequences of NP protein to represent the virulence evolution on time scale and also geographically. In this research, ClustalX2 was used for amino-acid sequence alignment, Jpred3 (http://www.compbio.dundee.ac.uk/jpred4/index.html) (Cole et al., 2008; Cuff et al., 1998) was used for the secondary-structure prediction, and Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) (Kelley & Sternberg, 2009) was used for the tertiary-structure prediction. The tertiary-structure was visualized by VMD and DeepView-SwissPdb Viewer (Guex, N. and Peitsch, M.C, 1997), using the produced pdb file by Phyre2 server. Python was used to facilitate the selection of residues blocks, for the accuracy of following part. We believe that our multitier approach of identifying the virulence evolution hotspots in a fast emerging infectious agent such as EBOV would enable faster drug and vaccine development. We also believe that our platform can be customized to predict and prevent the further spread of this deadly pathogen. Bachelor of Science in Physics 2015-06-09T01:27:19Z 2015-06-09T01:27:19Z 2015 2015 Final Year Project (FYP) http://hdl.handle.net/10356/64856 en 24 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Science::Physics
spellingShingle DRNTU::Science::Physics
Xu, Yu
Quantitative data analysis
description The 2014 Ebola epidemic is the largest in the history of the world and the first time in West Africa, affecting multiple countries in the region. It has the potential to rapidly spread worldwide and cause the biggest ever pandemic. We are interested in understanding the molecular evolution of this emerging infectious disease that has already killed more than 8000 people in West Africa alone with a fatality rate of nearly 70% in certain regions. Ebola Virus (EBOV) is one of three members in Filoviridae virus family (The other two are Cuevavirus and Marburgvirus). Five strains of EBOV have been classified, including four African ones: Tai Forest (also known as Ivory Coast), Sudan, Zaire, and Bundibugyo, as well as one Philippines strain Reston. Reston strains are non-infectious to humans. Fatality rate for the Sudan and Bundibugyo strains is around 40% while for Zaire strain is about 90%. It has been identified that 2014 Western Africa outbreak is due to EBOV belongs to Zaire. Still no effectual vaccine has been produced. We have adapted the approach of sequence to structure at macro and micro level to identify the hotspots of evolution in the EBOV genome. The average genome size of EBOV is 18940. The ssRNA genome encodes seven proteins: NP, vp35, vp40, GP, vp30, vp24 and L. Of these seven proteins NP (Nuclear Protein) plays a critical role in virulence. Hence we focused on this NP gene to corroborate the molecular evolution results obtained with whole genomes. We have collected all the relevant genome and NP gene data from NCBI Virus Variation Resource Http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/Database/nph-select2.cgi?cmd=database&taxid=186536). We chose seven sample sequences of NP protein to represent the virulence evolution on time scale and also geographically. In this research, ClustalX2 was used for amino-acid sequence alignment, Jpred3 (http://www.compbio.dundee.ac.uk/jpred4/index.html) (Cole et al., 2008; Cuff et al., 1998) was used for the secondary-structure prediction, and Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) (Kelley & Sternberg, 2009) was used for the tertiary-structure prediction. The tertiary-structure was visualized by VMD and DeepView-SwissPdb Viewer (Guex, N. and Peitsch, M.C, 1997), using the produced pdb file by Phyre2 server. Python was used to facilitate the selection of residues blocks, for the accuracy of following part. We believe that our multitier approach of identifying the virulence evolution hotspots in a fast emerging infectious agent such as EBOV would enable faster drug and vaccine development. We also believe that our platform can be customized to predict and prevent the further spread of this deadly pathogen.
author2 Su Haibin
author_facet Su Haibin
Xu, Yu
format Final Year Project
author Xu, Yu
author_sort Xu, Yu
title Quantitative data analysis
title_short Quantitative data analysis
title_full Quantitative data analysis
title_fullStr Quantitative data analysis
title_full_unstemmed Quantitative data analysis
title_sort quantitative data analysis
publishDate 2015
url http://hdl.handle.net/10356/64856
_version_ 1759856745640886272