Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts

The use of modern Large Language Models (LLMs) as chatbots still has some problems such as hallucinations and lack of empathy. Identifying these issues can help improve chatbot performance. The community has been continually iterating on reference-free dialogue evaluation methods based on large lang...

Full description

Saved in:

Bibliographic Details
Main Authors:	PU, Jiashu, CHENG, Ling, FAN, Lu, LV, Tangjie, ZHANG, Rongsheng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Chatbots Context learning Dialogue evaluation Evaluation methods In contexts Language model Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/8751 https://ink.library.smu.edu.sg/context/sis_research/article/9754/viewcontent/2023.emnlp_main.590_pvoa.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9754
record_format	dspace
spelling	sg-smu-ink.sis_research-97542024-05-03T06:59:58Z Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts PU, Jiashu CHENG, Ling FAN, Lu LV, Tangjie ZHANG, Rongsheng The use of modern Large Language Models (LLMs) as chatbots still has some problems such as hallucinations and lack of empathy. Identifying these issues can help improve chatbot performance. The community has been continually iterating on reference-free dialogue evaluation methods based on large language models (LLMs) that can be readily applied. However, many of these LLM-based metrics require selecting specific datasets and developing specialized training tasks for different evaluation dimensions (e.g., coherence, informative). The developing step can be time-consuming and may need to be repeated for new evaluation dimensions. To enable efficient and flexible adaptation to diverse needs of dialogue evaluation, we propose a dimension-agnostic scoring method that leverages the in-context learning (ICL) capability of LLMs to learn from human scoring to the fullest extent. Our method has three key features. To begin with, rather than manual prompt crafting, we propose automatically generating prompts, allowing the LLM to observe human labels and summarize the most suitable prompt. Additionally, since the LLM has a token limit and ICL is sensitive to demonstration variations, we train a selector to finely customize demonstrations and prompts for each dialogue input. Finally, during inference, we propose to request the LLM multiple times with a subgraph of demonstrations and prompts that are diverse and suitable to maximize ICL from various human scoring. We validate the efficacy of our method on five datasets, even with a small amount of annotated data, our method outperforms all strong baselines. Code is available at https://github.com/iamlxb3/EMNLP2023-ADOROR. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8751 info:doi/10.18653/v1/2023.emnlp-main.590 https://ink.library.smu.edu.sg/context/sis_research/article/9754/viewcontent/2023.emnlp_main.590_pvoa.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Chatbots Context learning Dialogue evaluation Evaluation methods In contexts Language model Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Chatbots Context learning Dialogue evaluation Evaluation methods In contexts Language model Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing
spellingShingle	Chatbots Context learning Dialogue evaluation Evaluation methods In contexts Language model Artificial Intelligence and Robotics Numerical Analysis and Scientific Computing PU, Jiashu CHENG, Ling FAN, Lu LV, Tangjie ZHANG, Rongsheng Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
description	The use of modern Large Language Models (LLMs) as chatbots still has some problems such as hallucinations and lack of empathy. Identifying these issues can help improve chatbot performance. The community has been continually iterating on reference-free dialogue evaluation methods based on large language models (LLMs) that can be readily applied. However, many of these LLM-based metrics require selecting specific datasets and developing specialized training tasks for different evaluation dimensions (e.g., coherence, informative). The developing step can be time-consuming and may need to be repeated for new evaluation dimensions. To enable efficient and flexible adaptation to diverse needs of dialogue evaluation, we propose a dimension-agnostic scoring method that leverages the in-context learning (ICL) capability of LLMs to learn from human scoring to the fullest extent. Our method has three key features. To begin with, rather than manual prompt crafting, we propose automatically generating prompts, allowing the LLM to observe human labels and summarize the most suitable prompt. Additionally, since the LLM has a token limit and ICL is sensitive to demonstration variations, we train a selector to finely customize demonstrations and prompts for each dialogue input. Finally, during inference, we propose to request the LLM multiple times with a subgraph of demonstrations and prompts that are diverse and suitable to maximize ICL from various human scoring. We validate the efficacy of our method on five datasets, even with a small amount of annotated data, our method outperforms all strong baselines. Code is available at https://github.com/iamlxb3/EMNLP2023-ADOROR.
format	text
author	PU, Jiashu CHENG, Ling FAN, Lu LV, Tangjie ZHANG, Rongsheng
author_facet	PU, Jiashu CHENG, Ling FAN, Lu LV, Tangjie ZHANG, Rongsheng
author_sort	PU, Jiashu
title	Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
title_short	Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
title_full	Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
title_fullStr	Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
title_full_unstemmed	Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
title_sort	just adjust one prompt: enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8751 https://ink.library.smu.edu.sg/context/sis_research/article/9754/viewcontent/2023.emnlp_main.590_pvoa.pdf
_version_	1814047501461749760

Just adjust one prompt: Enhancing in-context dialogue scoring via constructing the optimal subgraph of demonstrations and prompts

Similar Items