BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS

Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data anno...

Full description

Saved in:
Bibliographic Details
Main Author: Arbi Parameswara, Made
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/85305
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data annotation makes this process challenging. Machine learning techniques, particularly deep learning, can be used to efficiently extract and identify DDI from biomedical literature. However, class imbalance in the datasets remains a significant issue affecting model performance. This study introduces BFT-GBRET, a combination of data augmentation using the Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial Network (GAN) to address class imbalance in DDI extraction tasks. The research identifies gaps in existing imbalance handler studies and proposes performance improvements through data augmentation by PLM and the use of unlabeled data in semi-supervised learning with GAN. The combination of PLM and GAN can generate high-quality data that closely resembles the original data, enhancing the model's ability to recognize and extract drug interactions from biomedical texts. BioGPT-2 is used for data augmentation, generating additional data from labeled and unlabeled sources, enriching the training dataset. This data is then processed semi-supervised using GAN-BERT, allowing the model to learn from more complex and realistic data distributions, thereby improving data quality and the model's generalization ability. Evaluation results show that BFT-GBRET outperforms several baselines, with a significant increase in the Micro F1-score metric for minor classes. The Micro F1-score for oversampling, the best baseline imbalance handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its effectiveness in handling class imbalance and contextual variations in biomedical data. This approach shows great potential for broader application in NLP tasks in the biomedical field, enhancing the performance and reliability of clinical decision support systems.