ANALISIS GENOM PADA 10,107 SEKUENS SARS-COV-2 UNTUK IDENTIFIKASI KEBERADAAN SINGLE-NUCLEOTIDE POLYMORPHISM

A new type of coronavirus was identified in Wuhan, China, in December 2019 which was named SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2). The virus has caused the COVID-19 outbreak and was declared a pandemic by WHO on March 11, 2020. According to records of July 16, 2021, GISAID note...

Full description

Saved in:
Bibliographic Details
Main Author: Hasna Syahira, Nandrea
Format: Final Project
Language:Indonesia
Subjects:
Online Access:https://digilib.itb.ac.id/gdl/view/61441
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:A new type of coronavirus was identified in Wuhan, China, in December 2019 which was named SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2). The virus has caused the COVID-19 outbreak and was declared a pandemic by WHO on March 11, 2020. According to records of July 16, 2021, GISAID noted that more than 189 million people worldwide have been affected by this deadly virus, with the death toll of more than 4 million. The speed with which the virus spreads cannot be separated from the emergence of new variants due to mutations that increase the transmissibility of SARS-CoV-2. In addition, the high mutation rate of SARS-CoV-2 makes it difficult to develop a vaccine that is effective against all variants. Substitution is the most common type of mutation in SARS-CoV-2. Substitution with a high frequency or occurring in at least 1% of the population is referred as Single-Nucleotide Polymorphism (SNP). Often, SNPs do not have a significant impact on gene expression, but some SNPs in protein-coding and non-coding regions are capable of causing changes in protein structure, protein function, and regulation of protein expression. This study was conducted to identify the genetic variability of mutations in the form of Single-Nucleotide Polymorphism (SNP) in SARS-CoV-2 and analyze the impact generated by the SNPs that occur. The results of this study are expected to help identify conserved regions in SARS-CoV-2 that can be used as probes for the virus identification process and can be used as target areas in vaccine development. In this study, 15,000 SARS-CoV-2 sequences were downloaded from GISAID isolated from 35 different countries around the world, namely South Africa, Ghana, Kenya, Madagascar, Mali, Morocco, Mayotte, Mozambique, Reunion, China, Hong Kong, India. , Indonesia, Japan, Malaysia, Russia, Saudi Arabia, Singapore, Taiwan, United States of America, Canada, Costa Rica, Mexico, Brazil, Peru, Colombia, Belgium, Spain, France, Italy, Turkey, Germany, Switzerland, Australia and Guam . The samples taken have a time span from February 2020 to July 2021. Multiple Sequence Alignment is carried out using the MAFFT web-server (version 7.481) with the progressive alignment method. Variants of these samples were identified using the Nextclade tool on the Nextstrain webserver. The results show that the most variants are variants 20B, 20A, and 20I (Alpha) with 32.12%, 23.95%, and 17.39% of the total population, respectively. Next, the SNP calling on the sample was performed using the SNP-sites program and the SNPs were extracted using Excel. Of the 10,107 SARS-CoV- 2 samples studied, 154 SNPs were found with the highest number of SNPs located in the spike, nsp3 and nucleocapsid genes. In this study, the ratio of mutations per sequence length also measured to determine which part of the gene has the highest mutation. The ratio of mutations per sequence length was found largest in the ORF8, ORF7a, and ORF7b genes with values of 0.537, 0.474, and 0.419, respectively. These results indicate that the high number of SNP in a gene cannot be a measure of whether or not the mutation rate of the gene is high.