BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT

Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint meth...

Full description

Saved in:
Bibliographic Details
Main Author: Mulya, Dimmas
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/73358
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:73358
spelling id-itb.:733582023-06-19T18:56:29ZBIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT Mulya, Dimmas Indonesia Theses biomedical event extraction, pipeline method, sequence labelling, BERT, multi-label classification. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/73358 Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint method approach, softmax decoder in the event trigger identification section, and BioBERT v1.1 encoder. However, this event extraction model has several drawbacks, which are built using joint method where each task is executed independently, does not provide special handling of the emergence of event trigger labels which include multi-labels, and still uses the BioBERT v1.1 encoder which is still using vocabulary from non biomedical domains. In this thesis research, a modification of the biomedical event extraction model was carried out to correct this error. The modifications applied are changing the joint method to pipeline so that it can provide forward information between tasks, in the event trigger identification task, the softmax decoder is replaced with a sigmoid to handle multi-labels, and the BERT encoder has been trained with a biomedical domain specific vocabulary, and to avoid overfitting to certain word patterns, event masking system will also be applied in transitions between pipeline modules. The experiment was carried out in the form of a comparison of the modified model architecture with the original architecture of previous research using the F1-Score evaluation metric. From the modifications made, the performance improvement of the biomedical event extraction model occurs by applying an encoder that has been built with a biomedical specific domain vocabulary. Changing the joint method to pipeline and changing the softmax decoder to sigmoid in the event trigger identification task did not provide an increase in the biomedical event extraction model. The best model results were built using the joint method, softmax decoder, and SciBERT encoder with an F1-Score value of 64.50. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint method approach, softmax decoder in the event trigger identification section, and BioBERT v1.1 encoder. However, this event extraction model has several drawbacks, which are built using joint method where each task is executed independently, does not provide special handling of the emergence of event trigger labels which include multi-labels, and still uses the BioBERT v1.1 encoder which is still using vocabulary from non biomedical domains. In this thesis research, a modification of the biomedical event extraction model was carried out to correct this error. The modifications applied are changing the joint method to pipeline so that it can provide forward information between tasks, in the event trigger identification task, the softmax decoder is replaced with a sigmoid to handle multi-labels, and the BERT encoder has been trained with a biomedical domain specific vocabulary, and to avoid overfitting to certain word patterns, event masking system will also be applied in transitions between pipeline modules. The experiment was carried out in the form of a comparison of the modified model architecture with the original architecture of previous research using the F1-Score evaluation metric. From the modifications made, the performance improvement of the biomedical event extraction model occurs by applying an encoder that has been built with a biomedical specific domain vocabulary. Changing the joint method to pipeline and changing the softmax decoder to sigmoid in the event trigger identification task did not provide an increase in the biomedical event extraction model. The best model results were built using the joint method, softmax decoder, and SciBERT encoder with an F1-Score value of 64.50.
format Theses
author Mulya, Dimmas
spellingShingle Mulya, Dimmas
BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
author_facet Mulya, Dimmas
author_sort Mulya, Dimmas
title BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
title_short BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
title_full BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
title_fullStr BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
title_full_unstemmed BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
title_sort biomedical events extraction using multi-label classification and pre-trained bert
url https://digilib.itb.ac.id/gdl/view/73358
_version_ 1822007085416579072