BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data anno...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/85305 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:85305 |
---|---|
spelling |
id-itb.:853052024-08-20T10:07:25ZBFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS Arbi Parameswara, Made Indonesia Theses extraction, DDI, imbalance handler, data augmentation, BioGPT-2, GAN-BERT INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/85305 Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data annotation makes this process challenging. Machine learning techniques, particularly deep learning, can be used to efficiently extract and identify DDI from biomedical literature. However, class imbalance in the datasets remains a significant issue affecting model performance. This study introduces BFT-GBRET, a combination of data augmentation using the Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial Network (GAN) to address class imbalance in DDI extraction tasks. The research identifies gaps in existing imbalance handler studies and proposes performance improvements through data augmentation by PLM and the use of unlabeled data in semi-supervised learning with GAN. The combination of PLM and GAN can generate high-quality data that closely resembles the original data, enhancing the model's ability to recognize and extract drug interactions from biomedical texts. BioGPT-2 is used for data augmentation, generating additional data from labeled and unlabeled sources, enriching the training dataset. This data is then processed semi-supervised using GAN-BERT, allowing the model to learn from more complex and realistic data distributions, thereby improving data quality and the model's generalization ability. Evaluation results show that BFT-GBRET outperforms several baselines, with a significant increase in the Micro F1-score metric for minor classes. The Micro F1-score for oversampling, the best baseline imbalance handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its effectiveness in handling class imbalance and contextual variations in biomedical data. This approach shows great potential for broader application in NLP tasks in the biomedical field, enhancing the performance and reliability of clinical decision support systems. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Drug-drug interactions (DDI) occur when two or more drugs are used together and
react in the body, causing unexpected and potentially harmful effects. Identifying
DDI requires specific datasets such as DDI Extraction 2013, but the increasing
number of research publications without rapid data annotation makes this process
challenging. Machine learning techniques, particularly deep learning, can be used
to efficiently extract and identify DDI from biomedical literature. However, class
imbalance in the datasets remains a significant issue affecting model performance.
This study introduces BFT-GBRET, a combination of data augmentation using the
Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial
Network (GAN) to address class imbalance in DDI extraction tasks. The research
identifies gaps in existing imbalance handler studies and proposes performance
improvements through data augmentation by PLM and the use of unlabeled data in
semi-supervised learning with GAN. The combination of PLM and GAN can
generate high-quality data that closely resembles the original data, enhancing the
model's ability to recognize and extract drug interactions from biomedical texts.
BioGPT-2 is used for data augmentation, generating additional data from labeled
and unlabeled sources, enriching the training dataset. This data is then processed
semi-supervised using GAN-BERT, allowing the model to learn from more complex
and realistic data distributions, thereby improving data quality and the model's
generalization ability. Evaluation results show that BFT-GBRET outperforms
several baselines, with a significant increase in the Micro F1-score metric for
minor classes. The Micro F1-score for oversampling, the best baseline imbalance
handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its
effectiveness in handling class imbalance and contextual variations in biomedical
data. This approach shows great potential for broader application in NLP tasks in
the biomedical field, enhancing the performance and reliability of clinical decision
support systems. |
format |
Theses |
author |
Arbi Parameswara, Made |
spellingShingle |
Arbi Parameswara, Made BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS |
author_facet |
Arbi Parameswara, Made |
author_sort |
Arbi Parameswara, Made |
title |
BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS |
title_short |
BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS |
title_full |
BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS |
title_fullStr |
BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS |
title_full_unstemmed |
BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS |
title_sort |
bft-gbret: biogpt-2 fine tuned and gan-bert for extracting drugs interaction based on biomedical texts |
url |
https://digilib.itb.ac.id/gdl/view/85305 |
_version_ |
1822999121571610624 |