LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION

The progress in Natural Language Processing (NLP) has brought about challenges in managing disagreements within annotated datasets, particularly in text classification tasks. This final project explores innovative methods to tackle annotation discrepancies by employing multi-annotator modeling and p...

Full description

Saved in:
Bibliographic Details
Main Author: Chandrasaputra, Christopher
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/87586
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:87586
spelling id-itb.:875862025-01-31T10:58:13ZLEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION Chandrasaputra, Christopher Indonesia Final Project Natural Language Processing, Text Classification, Disagreement, Large Language Models, Annotator Weighting. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87586 The progress in Natural Language Processing (NLP) has brought about challenges in managing disagreements within annotated datasets, particularly in text classification tasks. This final project explores innovative methods to tackle annotation discrepancies by employing multi-annotator modeling and predictions supported by Large Language Models (LLMs). The main objective is to enhance prediction accuracy by integrating annotator-specific weighting and leveraging LLMs to address conflicts. The research centers on datasets from SemEval 2023, which encompass multiple domains with diverse annotation variations. Two primary strategies were developed: (1) an annotator weighting mechanism to evaluate and modify individual contributions based on levels of agreement, and (2) an LLM-assisted prediction system to aid in decision-making during instances of disagreement. Experiments were carried out using resampled datasets and pre-trained language models to boost computational efficiency and resilience against ambiguous data. The results indicate that the combined strategy of Annotator Weighting and LLM-Assisted prediction enhances prediction performance by up to 0.13 in F1-Micro score and 0.059 in Cross Entropy score compared to the baseline, with the annotator weighting method underscoring the influence of individual annotators and the LLM-assisted method resolving predictions amid disagreements. These insights contribute to a deeper understanding of conflicts in NLP tasks, facilitating even more precise text classification. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The progress in Natural Language Processing (NLP) has brought about challenges in managing disagreements within annotated datasets, particularly in text classification tasks. This final project explores innovative methods to tackle annotation discrepancies by employing multi-annotator modeling and predictions supported by Large Language Models (LLMs). The main objective is to enhance prediction accuracy by integrating annotator-specific weighting and leveraging LLMs to address conflicts. The research centers on datasets from SemEval 2023, which encompass multiple domains with diverse annotation variations. Two primary strategies were developed: (1) an annotator weighting mechanism to evaluate and modify individual contributions based on levels of agreement, and (2) an LLM-assisted prediction system to aid in decision-making during instances of disagreement. Experiments were carried out using resampled datasets and pre-trained language models to boost computational efficiency and resilience against ambiguous data. The results indicate that the combined strategy of Annotator Weighting and LLM-Assisted prediction enhances prediction performance by up to 0.13 in F1-Micro score and 0.059 in Cross Entropy score compared to the baseline, with the annotator weighting method underscoring the influence of individual annotators and the LLM-assisted method resolving predictions amid disagreements. These insights contribute to a deeper understanding of conflicts in NLP tasks, facilitating even more precise text classification.
format Final Project
author Chandrasaputra, Christopher
spellingShingle Chandrasaputra, Christopher
LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
author_facet Chandrasaputra, Christopher
author_sort Chandrasaputra, Christopher
title LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
title_short LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
title_full LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
title_fullStr LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
title_full_unstemmed LEARNING THROUGH DISAGREEMENTS IN TEXT CLASSIFICATION: ANNOTATOR WEIGHTING AND LARGE LANGUAGE MODEL ASSISTED PREDICTION
title_sort learning through disagreements in text classification: annotator weighting and large language model assisted prediction
url https://digilib.itb.ac.id/gdl/view/87586
_version_ 1823000101538234368