Unraveling high-throughput demultiplexing techniques across multiple plant species

RNA sequencing (RNA-seq) is essential for understanding biological mechanisms in plant biology. RNA-seq samples are pooled together (multiplexed) for simultaneous sequencing. Traditional demultiplexing methods often rely on expensive barcode matching, leading to collisions—misidentifications of samp...

Full description

Saved in:
Bibliographic Details
Main Author: Maitra, Ishani
Other Authors: Marek Mutwil
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176353
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:RNA sequencing (RNA-seq) is essential for understanding biological mechanisms in plant biology. RNA-seq samples are pooled together (multiplexed) for simultaneous sequencing. Traditional demultiplexing methods often rely on expensive barcode matching, leading to collisions—misidentifications of samples due to sequencing noise or inaccuracies in barcode assignment, especially in complex data. Therefore, we proposed a cost-efficient demultiplexing method that can accommodate complex datasets. The method is tested on Arabidopsis thaliana, Brachypodium distachyon, and Oldenlandia corymbosa, with A. thaliana and B. distachyon subjected to dark stress treatment. The samples are pooled together in various multiplex combinations. RNA sequences were aligned to a reference coding sequence (CDS) genome using HISAT2. A multiplex CDS was achieved by concatenating the three species’ reference genomes. A strong correlation was observed and suggested that multiplex CDS can be used for subsequent comparative analysis. The control read counts were scaled according to the observed linear relationship between O. corymbosa gene read counts in both control and treatment groups within the multiplex ABO samples. DEGs were precisely identified using DESeq2 and a proposed differential gene expression analysis on scaled control read counts. We demonstrated a promising cost-efficient demultiplexing method capable of handling large and complex datasets without the need for barcoding.