Quantitative data analysis

The 2014 Ebola epidemic is the largest in the history of the world and the first time in West Africa, affecting multiple countries in the region. It has the potential to rapidly spread worldwide and cause the biggest ever pandemic. We are interested in understanding the molecular evolution of this e...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Yu
Other Authors: Su Haibin
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/64856
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The 2014 Ebola epidemic is the largest in the history of the world and the first time in West Africa, affecting multiple countries in the region. It has the potential to rapidly spread worldwide and cause the biggest ever pandemic. We are interested in understanding the molecular evolution of this emerging infectious disease that has already killed more than 8000 people in West Africa alone with a fatality rate of nearly 70% in certain regions. Ebola Virus (EBOV) is one of three members in Filoviridae virus family (The other two are Cuevavirus and Marburgvirus). Five strains of EBOV have been classified, including four African ones: Tai Forest (also known as Ivory Coast), Sudan, Zaire, and Bundibugyo, as well as one Philippines strain Reston. Reston strains are non-infectious to humans. Fatality rate for the Sudan and Bundibugyo strains is around 40% while for Zaire strain is about 90%. It has been identified that 2014 Western Africa outbreak is due to EBOV belongs to Zaire. Still no effectual vaccine has been produced. We have adapted the approach of sequence to structure at macro and micro level to identify the hotspots of evolution in the EBOV genome. The average genome size of EBOV is 18940. The ssRNA genome encodes seven proteins: NP, vp35, vp40, GP, vp30, vp24 and L. Of these seven proteins NP (Nuclear Protein) plays a critical role in virulence. Hence we focused on this NP gene to corroborate the molecular evolution results obtained with whole genomes. We have collected all the relevant genome and NP gene data from NCBI Virus Variation Resource Http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/Database/nph-select2.cgi?cmd=database&taxid=186536). We chose seven sample sequences of NP protein to represent the virulence evolution on time scale and also geographically. In this research, ClustalX2 was used for amino-acid sequence alignment, Jpred3 (http://www.compbio.dundee.ac.uk/jpred4/index.html) (Cole et al., 2008; Cuff et al., 1998) was used for the secondary-structure prediction, and Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) (Kelley & Sternberg, 2009) was used for the tertiary-structure prediction. The tertiary-structure was visualized by VMD and DeepView-SwissPdb Viewer (Guex, N. and Peitsch, M.C, 1997), using the produced pdb file by Phyre2 server. Python was used to facilitate the selection of residues blocks, for the accuracy of following part. We believe that our multitier approach of identifying the virulence evolution hotspots in a fast emerging infectious agent such as EBOV would enable faster drug and vaccine development. We also believe that our platform can be customized to predict and prevent the further spread of this deadly pathogen.