EXPRESSION PROFILE AND DIFFERENTIAL ANALYSIS OF SOYBEANS (GLYCINE MAX) RNA-SEQ USING K-MEANS CLUSTERING APPROACH ON VARIOUS SEED DEVELOPMENTAL STAGES

Soybean (Glycine max) is a legume plant with a high commercial value because of its high protein and lipid contents. Beside being an important staple food, soybean can be used to produce oil. Thus, it is important to learn and research about its gene expression activities during seed developmenta...

Full description

Saved in:
Bibliographic Details
Main Author: Willyanto, Ryan
Format: Final Project
Language:Indonesia
Subjects:
Online Access:https://digilib.itb.ac.id/gdl/view/62347
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Soybean (Glycine max) is a legume plant with a high commercial value because of its high protein and lipid contents. Beside being an important staple food, soybean can be used to produce oil. Thus, it is important to learn and research about its gene expression activities during seed developmental stages and how they can benefit the uses of soybean. This study aims to determine the expression profile and pathways which undergo changes during the developmental stages according to K-Means Clustering analysis. This analysis was done on 5 kinds of RNA-Seq samples with 2 replications on 5 developmental stages which are globular stage (GS), heart stage (HS), cotyledon stage (CS), maturation stage (MS), and dry seed (DS). RNA-Seq analysis started with quality control of the data using fasp (v0.20.1), alignment using HISAT2 (v2.2.1), and quality measurement of the alignment using PICARD (v2.18.2.2). Reads measurement was done using Htseq-Count (v.0.9.1) before k-means clustering, enrichment, and differential analyses were done using DEseq2 on the iDEP website (v0.93). The results showed that there were 4 kinds of gene clusters (A, B, C, D) with different expression dynamics. Cluster A included pathways of protein processing, galactose and glutathione metabolisms which increased gradually, following the developmental stages (GS-DS). Cluster B consisted of diterpenoid biosynthesis pathway which expression peaked in MS. Cluster G was a cluster of 10 pathways that decreased in expression, especially in flavonoid biosynthesis which decreased significantly in DS. The D cluster of genes that included linoleic acid metabolism also decreased in expression gradually during seed developmental stages.