BFT-GBRET: BIOGPT-2 FINE TUNED AND GAN-BERT FOR EXTRACTING DRUGS INTERACTION BASED ON BIOMEDICAL TEXTS

Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data anno...

全面介紹

Saved in:
書目詳細資料
主要作者: Arbi Parameswara, Made
格式: Theses
語言:Indonesia
在線閱讀:https://digilib.itb.ac.id/gdl/view/85305
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Institut Teknologi Bandung
語言: Indonesia
實物特徵
總結:Drug-drug interactions (DDI) occur when two or more drugs are used together and react in the body, causing unexpected and potentially harmful effects. Identifying DDI requires specific datasets such as DDI Extraction 2013, but the increasing number of research publications without rapid data annotation makes this process challenging. Machine learning techniques, particularly deep learning, can be used to efficiently extract and identify DDI from biomedical literature. However, class imbalance in the datasets remains a significant issue affecting model performance. This study introduces BFT-GBRET, a combination of data augmentation using the Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial Network (GAN) to address class imbalance in DDI extraction tasks. The research identifies gaps in existing imbalance handler studies and proposes performance improvements through data augmentation by PLM and the use of unlabeled data in semi-supervised learning with GAN. The combination of PLM and GAN can generate high-quality data that closely resembles the original data, enhancing the model's ability to recognize and extract drug interactions from biomedical texts. BioGPT-2 is used for data augmentation, generating additional data from labeled and unlabeled sources, enriching the training dataset. This data is then processed semi-supervised using GAN-BERT, allowing the model to learn from more complex and realistic data distributions, thereby improving data quality and the model's generalization ability. Evaluation results show that BFT-GBRET outperforms several baselines, with a significant increase in the Micro F1-score metric for minor classes. The Micro F1-score for oversampling, the best baseline imbalance handler model, is 0.8311, while BFT-GBRET achieves 0.8482, indicating its effectiveness in handling class imbalance and contextual variations in biomedical data. This approach shows great potential for broader application in NLP tasks in the biomedical field, enhancing the performance and reliability of clinical decision support systems.