GENOMIC SELECTION USING MULTIPLE LINEAR REGRESSION WITH REGULARIZATION METHODS: RIDGE REGRESSION, LASSO, AND ELASTIC NET

Natural resources play a crucial role in the sustainability of human life. The quality of natural resources varies from one to another. Not all natural resources have the desired quality by humans. With the development of technology, breeding methods have been discovered to produce resources with...

Full description

Saved in:
Bibliographic Details
Main Author: Ethan Novriawan, Jeremy
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76254
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Natural resources play a crucial role in the sustainability of human life. The quality of natural resources varies from one to another. Not all natural resources have the desired quality by humans. With the development of technology, breeding methods have been discovered to produce resources with desired qualities. One of the methods used to predict the breeding results of these biological resources is genomic selection. This method seeks to find the relationship between observable phenotypes or characteristics and the genetic values possessed by an individual. Genetic values are obtained by examining the values of Single Nucleotide Polymorphism (SNP) or by observing the genetic values of individuals at specific positions. By knowing the relationship between phenotypes and genetics, the desired phenotype quality can be predicted by manipulating the genetic values of an individual. In prediction, a multiple linear regression is used. To achieve better predictive values, regularization techniques like Ridge Regression, LASSO, and Elastic Net are applied. In this Final Project, a model is constructed using an open dataset from a journal that discusses cotton quality in Australia. Based on experiments, it is found that models using regularization provide the best results for genomic selection. The predicted values of data that undergo predictor and observation preprocessing are more accurate compared to data that are not preprocessed.