COREFERENCE RESOLUTION IN INDONESIAN LANGUAGE USING WORD LEVEL COREFERENCE RESOLUTION ARHITECTURE
Coreference resolution is a problem in the field of text processing to find all mentions that refer to the same entity in the real world. Coreference resolution can be used to help solve problems in other natural language processing, namely entity linking, machine translation, summarization, chat...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/66654 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Coreference resolution is a problem in the field of text processing to find all
mentions that refer to the same entity in the real world. Coreference resolution can
be used to help solve problems in other natural language processing, namely entity
linking, machine translation, summarization, chatbots, and question answering.
Research on coreference resolution (coref) for Indonesian is still minimal. The
coref research in Indonesian is relatively incomparable with each other because
the data used are relatively different.
The problems found in the Indonesian coref are problems with the dataset and
problems with the algorithm. The problem with the dataset is that there is no
standard dataset that can be used as a benchmark. The problem with the algorithm
is that there is no research that uses the latest methods from several deep learning
architectures that achieve competitive performance as in the English dataset.
Another algorithm problem is that the best previous research still uses the pipelined
system approach.
This thesis is part of a joint research conducted by ITB, Prosa.ai, and AI Singapore.
The research includes the creation of a Coreference Resolution in the Indonesian
Language (COIN) dataset with standards that are adapted to the OntoNotes dataset
standards and modeling using the c2f-coref and wl-coref architectures. This thesis
has scope to build program code and carry out experiments using word level
coreference resolution (wl-coref) architecture. In addition, there is an architectural
experiment of Higher-order Coreference Resolution with Coarse-to-fine Inference
(c2f-coref) with a variation of the BERT encoder carried out by engineers from AI
Singapore. The analysis is carried out together to compare and analyze the
performance of the model.
The wl-coref architecture was chosen as the solution in this thesis due to its
efficiency and competitive performance. The step in the wl-coref architecture is to
find coreference links between word tokens, then perform span construction from
IV
tokens that have coreference links. The adaptation process carried out on the wl-
coref architecture includes changes to pairwise features (hand crafted features)
using only the distance between spans because other pairwise features are not
available in the COIN dataset. In addition, the wl-coref architecture requires data
dependency relations to be used as data in the span construction module.
Meanwhile, in the COIN dataset this information is not available, so the data
dependency relation is generated using the stanza library.
Based on the experimental results, the wl-coref architecture (F1 score 76.24) is
better than the c2f-coref architecture (F1 score 76.02). But the difference in
performance between the two is not too big. This can be caused by the dependency
relation data used in Indonesian wl-coref generated using stanzas, while in English
the data is annotated manually. So that it can cause more errors in the Indonesian
wl-coref architecture. The best encoder for wl-coref and c2f-coref architecture in
Indonesian is XLM-RoBERTa-large. In addition, IndoSpanBERT-large provides
competitive performance under XLM-RoBERTa-large, so it can be a good choice
for encoder with a lighter model size. Tests on the LEA metric show that there is a
tendency for a model that is good on the CoNLL metric to be good on the LEA
metric as well. Although LEA and CoNLL metrics have different calculation
approaches.
Based on the observation of mention recall in several variations of mention types
and mention length, it shows that mention types with many instances tend to have
better mention recall than mention recall types with few instances. In addition, the
longer the mention, the model tends to get fewer recall mentions. The
hyperparameter tuning experiment in this thesis proves that the default
hyperparameter from Dobrovolskii's (2021) study is the best hyperparameter. |
---|