BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT
Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint meth...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/73358 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:73358 |
---|---|
spelling |
id-itb.:733582023-06-19T18:56:29ZBIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT Mulya, Dimmas Indonesia Theses biomedical event extraction, pipeline method, sequence labelling, BERT, multi-label classification. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/73358 Biomedical event extraction is a combined task of named-entity recognition (NER) and relation extraction (RE) applied to biomedical texts to obtain a list of events in biomedical texts. At present, the best biomedical event extraction research uses sequence labeling techniques with the joint method approach, softmax decoder in the event trigger identification section, and BioBERT v1.1 encoder. However, this event extraction model has several drawbacks, which are built using joint method where each task is executed independently, does not provide special handling of the emergence of event trigger labels which include multi-labels, and still uses the BioBERT v1.1 encoder which is still using vocabulary from non biomedical domains. In this thesis research, a modification of the biomedical event extraction model was carried out to correct this error. The modifications applied are changing the joint method to pipeline so that it can provide forward information between tasks, in the event trigger identification task, the softmax decoder is replaced with a sigmoid to handle multi-labels, and the BERT encoder has been trained with a biomedical domain specific vocabulary, and to avoid overfitting to certain word patterns, event masking system will also be applied in transitions between pipeline modules. The experiment was carried out in the form of a comparison of the modified model architecture with the original architecture of previous research using the F1-Score evaluation metric. From the modifications made, the performance improvement of the biomedical event extraction model occurs by applying an encoder that has been built with a biomedical specific domain vocabulary. Changing the joint method to pipeline and changing the softmax decoder to sigmoid in the event trigger identification task did not provide an increase in the biomedical event extraction model. The best model results were built using the joint method, softmax decoder, and SciBERT encoder with an F1-Score value of 64.50. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Biomedical event extraction is a combined task of named-entity recognition (NER)
and relation extraction (RE) applied to biomedical texts to obtain a list of events in
biomedical texts. At present, the best biomedical event extraction research uses
sequence labeling techniques with the joint method approach, softmax decoder in
the event trigger identification section, and BioBERT v1.1 encoder. However, this
event extraction model has several drawbacks, which are built using joint method
where each task is executed independently, does not provide special handling of the
emergence of event trigger labels which include multi-labels, and still uses the
BioBERT v1.1 encoder which is still using vocabulary from non biomedical
domains.
In this thesis research, a modification of the biomedical event extraction model was
carried out to correct this error. The modifications applied are changing the joint
method to pipeline so that it can provide forward information between tasks, in the
event trigger identification task, the softmax decoder is replaced with a sigmoid to
handle multi-labels, and the BERT encoder has been trained with a biomedical
domain specific vocabulary, and to avoid overfitting to certain word patterns, event
masking system will also be applied in transitions between pipeline modules. The
experiment was carried out in the form of a comparison of the modified model
architecture with the original architecture of previous research using the F1-Score
evaluation metric.
From the modifications made, the performance improvement of the biomedical
event extraction model occurs by applying an encoder that has been built with a
biomedical specific domain vocabulary. Changing the joint method to pipeline and
changing the softmax decoder to sigmoid in the event trigger identification task did
not provide an increase in the biomedical event extraction model. The best model
results were built using the joint method, softmax decoder, and SciBERT encoder
with an F1-Score value of 64.50. |
format |
Theses |
author |
Mulya, Dimmas |
spellingShingle |
Mulya, Dimmas BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT |
author_facet |
Mulya, Dimmas |
author_sort |
Mulya, Dimmas |
title |
BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT |
title_short |
BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT |
title_full |
BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT |
title_fullStr |
BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT |
title_full_unstemmed |
BIOMEDICAL EVENTS EXTRACTION USING MULTI-LABEL CLASSIFICATION AND PRE-TRAINED BERT |
title_sort |
biomedical events extraction using multi-label classification and pre-trained bert |
url |
https://digilib.itb.ac.id/gdl/view/73358 |
_version_ |
1822007085416579072 |