BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS

Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data anno...

Full description

Saved in:
Bibliographic Details
Main Author: Arbi Parameswara, Made
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/85305
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:85305
spelling id-itb.:853052024-08-20T10:07:25ZBFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS Arbi Parameswara, Made Indonesia Theses extraction, DDI, imbalance handler, data augmentation, BioGPT-2, GAN-BERT INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/85305 Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data annotation makes this process challenging. Machine learning techniques, particularly deep learning, can be used to efficiently extract and identify DDI from biomedical literature. However, class imbalance in the datasets remains a significant issue affecting model performance. This study introduces BFT-GBRET, a combination of data augmentation using the Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial Network (GAN) to address class imbalance in DDI extraction tasks. The research identifies gaps in existing imbalance handler studies and proposes performance improvements through data augmentation by PLM and the use of unlabeled data in semi-supervised learning with GAN. The combination of PLM and GAN can generate high-quality data that closely resembles the original data, enhancing the model's ability to recognize and extract drug interactions from biomedical texts. BioGPT-2 is used for data augmentation, generating additional data from labeled and unlabeled sources, enriching the training dataset. This data is then processed semi-supervised using GAN-BERT, allowing the model to learn from more complex and realistic data distributions, thereby improving data quality and the model's generalization ability. Evaluation results show that BFT-GBRET outperforms several baselines, with a significant increase in the Micro F1-score metric for minor classes. The Micro F1-score for oversampling, the best baseline imbalance handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its effectiveness in handling class imbalance and contextual variations in biomedical data. This approach shows great potential for broader application in NLP tasks in the biomedical field, enhancing the performance and reliability of clinical decision support systems. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data annotation makes this process challenging. Machine learning techniques, particularly deep learning, can be used to efficiently extract and identify DDI from biomedical literature. However, class imbalance in the datasets remains a significant issue affecting model performance. This study introduces BFT-GBRET, a combination of data augmentation using the Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial Network (GAN) to address class imbalance in DDI extraction tasks. The research identifies gaps in existing imbalance handler studies and proposes performance improvements through data augmentation by PLM and the use of unlabeled data in semi-supervised learning with GAN. The combination of PLM and GAN can generate high-quality data that closely resembles the original data, enhancing the model's ability to recognize and extract drug interactions from biomedical texts. BioGPT-2 is used for data augmentation, generating additional data from labeled and unlabeled sources, enriching the training dataset. This data is then processed semi-supervised using GAN-BERT, allowing the model to learn from more complex and realistic data distributions, thereby improving data quality and the model's generalization ability. Evaluation results show that BFT-GBRET outperforms several baselines, with a significant increase in the Micro F1-score metric for minor classes. The Micro F1-score for oversampling, the best baseline imbalance handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its effectiveness in handling class imbalance and contextual variations in biomedical data. This approach shows great potential for broader application in NLP tasks in the biomedical field, enhancing the performance and reliability of clinical decision support systems.
format Theses
author Arbi Parameswara, Made
spellingShingle Arbi Parameswara, Made
BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
author_facet Arbi Parameswara, Made
author_sort Arbi Parameswara, Made
title BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
title_short BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
title_full BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
title_fullStr BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
title_full_unstemmed BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
title_sort bft-gbret: biogpt-2 fine tuned and gan-bert for extracting drugs interaction based on biomedical texts
url https://digilib.itb.ac.id/gdl/view/85305
_version_ 1822999121571610624