BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS
Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data anno...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/85305 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Drug-drug interactions (DDI) occur when two or more drugs are used together and
react in the body, causing unexpected and potentially harmful effects. Identifying
DDI requires specific datasets such as DDI Extraction 2013, but the increasing
number of research publications without rapid data annotation makes this process
challenging. Machine learning techniques, particularly deep learning, can be used
to efficiently extract and identify DDI from biomedical literature. However, class
imbalance in the datasets remains a significant issue affecting model performance.
This study introduces BFT-GBRET, a combination of data augmentation using the
Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial
Network (GAN) to address class imbalance in DDI extraction tasks. The research
identifies gaps in existing imbalance handler studies and proposes performance
improvements through data augmentation by PLM and the use of unlabeled data in
semi-supervised learning with GAN. The combination of PLM and GAN can
generate high-quality data that closely resembles the original data, enhancing the
model's ability to recognize and extract drug interactions from biomedical texts.
BioGPT-2 is used for data augmentation, generating additional data from labeled
and unlabeled sources, enriching the training dataset. This data is then processed
semi-supervised using GAN-BERT, allowing the model to learn from more complex
and realistic data distributions, thereby improving data quality and the model's
generalization ability. Evaluation results show that BFT-GBRET outperforms
several baselines, with a significant increase in the Micro F1-score metric for
minor classes. The Micro F1-score for oversampling, the best baseline imbalance
handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its
effectiveness in handling class imbalance and contextual variations in biomedical
data. This approach shows great potential for broader application in NLP tasks in
the biomedical field, enhancing the performance and reliability of clinical decision
support systems. |
---|