Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier

Functional doppelgängers (FDs) are independently derived sample pairs that confound machine learning model (ML) performance when assorted across training and validation sets. Here, we detail the use of doppelgangerIdentifier (DI), providing software installation, data preparation, doppelgänger ident...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Li Rong, Fan, Xiuyi, Goh, Wilson Wen Bin
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164598
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Functional doppelgängers (FDs) are independently derived sample pairs that confound machine learning model (ML) performance when assorted across training and validation sets. Here, we detail the use of doppelgangerIdentifier (DI), providing software installation, data preparation, doppelgänger identification, and functional testing steps. We demonstrate examples with biomedical gene expression data. We also provide guidelines for the selection of user-defined function arguments. For complete details on the use and execution of this protocol, please refer to Wang et al. (2022).