Analysis of new long-read sequencing data

The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long rea...

Full description

Saved in:
Bibliographic Details
Main Author: Phoa, Yohanes Alfredo
Other Authors: Kiah Han Mao
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77170
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The rapid development of powerful high throughput sequencing technologies has enabled us to gain valuable insights into the complexities of a human transcriptome. In recent years, Oxford Nanopore has developed a new technology that can take RNA directly as the sequencing input and generates long reads. In this thesis, we are using nanopore reading results from synthetic RNA samples and employ machine learning based approaches to identify patterns that distinguish signals from modified RNA readings from the unmodified counterpart. Firstly, we performed explorations of our dataset using a statistical test. We then proposed a simple baseline algorithm that learns the distinguishing features between unmodified strands and unmodified strands. Finally, we proposed a novel method on detecting anomalies by sequence labeling using deep learning.