BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT

Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint meth...

Full description

Saved in:
Bibliographic Details
Main Author: Mulya, Dimmas
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/73358
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint method approach, softmax decoder in the event trigger identification section, and BioBERT v1.1 encoder. However, this event extraction model has several drawbacks, which are built using joint method where each task is executed independently, does not provide special handling of the emergence of event trigger labels which include multi-labels, and still uses the BioBERT v1.1 encoder which is still using vocabulary from non biomedical domains. In this thesis research, a modification of the biomedical event extraction model was carried out to correct this error. The modifications applied are changing the joint method to pipeline so that it can provide forward information between tasks, in the event trigger identification task, the softmax decoder is replaced with a sigmoid to handle multi-labels, and the BERT encoder has been trained with a biomedical domain specific vocabulary, and to avoid overfitting to certain word patterns, event masking system will also be applied in transitions between pipeline modules. The experiment was carried out in the form of a comparison of the modified model architecture with the original architecture of previous research using the F1-Score evaluation metric. From the modifications made, the performance improvement of the biomedical event extraction model occurs by applying an encoder that has been built with a biomedical specific domain vocabulary. Changing the joint method to pipeline and changing the softmax decoder to sigmoid in the event trigger identification task did not provide an increase in the biomedical event extraction model. The best model results were built using the joint method, softmax decoder, and SciBERT encoder with an F1-Score value of 64.50.