LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
The progress in Natural Language Processing (NLP) has brought about challenges in managing disagreements within annotated datasets, particularly in text classification tasks. This final project explores innovative methods to tackle annotation discrepancies by employing multi-annotator modeling and p...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/87586 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:87586 |
---|---|
spelling |
id-itb.:875862025-01-31T10:58:13ZLEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION Chandrasaputra, Christopher Indonesia Final Project Natural Language Processing, Text Classification, Disagreement, Large Language Models, Annotator Weighting. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87586 The progress in Natural Language Processing (NLP) has brought about challenges in managing disagreements within annotated datasets, particularly in text classification tasks. This final project explores innovative methods to tackle annotation discrepancies by employing multi-annotator modeling and predictions supported by Large Language Models (LLMs). The main objective is to enhance prediction accuracy by integrating annotator-specific weighting and leveraging LLMs to address conflicts. The research centers on datasets from SemEval 2023, which encompass multiple domains with diverse annotation variations. Two primary strategies were developed: (1) an annotator weighting mechanism to evaluate and modify individual contributions based on levels of agreement, and (2) an LLM-assisted prediction system to aid in decision-making during instances of disagreement. Experiments were carried out using resampled datasets and pre-trained language models to boost computational efficiency and resilience against ambiguous data. The results indicate that the combined strategy of Annotator Weighting and LLM-Assisted prediction enhances prediction performance by up to 0.13 in F1-Micro score and 0.059 in Cross Entropy score compared to the baseline, with the annotator weighting method underscoring the influence of individual annotators and the LLM-assisted method resolving predictions amid disagreements. These insights contribute to a deeper understanding of conflicts in NLP tasks, facilitating even more precise text classification. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The progress in Natural Language Processing (NLP) has brought about challenges in managing disagreements within annotated datasets, particularly in text classification tasks. This final project explores innovative methods to tackle annotation discrepancies by employing multi-annotator modeling and predictions supported by Large Language Models (LLMs). The main objective is to enhance prediction accuracy by integrating annotator-specific weighting and leveraging LLMs to address conflicts.
The research centers on datasets from SemEval 2023, which encompass multiple domains with diverse annotation variations. Two primary strategies were developed: (1) an annotator weighting mechanism to evaluate and modify individual contributions based on levels of agreement, and (2) an LLM-assisted prediction system to aid in decision-making during instances of disagreement. Experiments were carried out using resampled datasets and pre-trained language models to boost computational efficiency and resilience against ambiguous data.
The results indicate that the combined strategy of Annotator Weighting and LLM-Assisted prediction enhances prediction performance by up to 0.13 in F1-Micro score and 0.059 in Cross Entropy score compared to the baseline, with the annotator weighting method underscoring the influence of individual annotators and the LLM-assisted method resolving predictions amid disagreements. These insights contribute to a deeper understanding of conflicts in NLP tasks, facilitating even more precise text classification. |
format |
Final Project |
author |
Chandrasaputra, Christopher |
spellingShingle |
Chandrasaputra, Christopher LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION |
author_facet |
Chandrasaputra, Christopher |
author_sort |
Chandrasaputra, Christopher |
title |
LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION |
title_short |
LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION |
title_full |
LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION |
title_fullStr |
LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION |
title_full_unstemmed |
LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION |
title_sort |
learning through disagreements in text classification: annotator weighting and large language model assisted prediction |
url |
https://digilib.itb.ac.id/gdl/view/87586 |
_version_ |
1823000101538234368 |