Examining the Inter-consistency of large language models: An in-depth analysis via debate

Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for co...

Full description

Saved in:
Bibliographic Details
Main Authors: XIONG, Kai, DING, Xiao, CAO, Yixin, LIU, Ting, QIN, Bing
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8391
https://ink.library.smu.edu.sg/context/sis_research/article/9394/viewcontent/2305.11595.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9394
record_format dspace
spelling sg-smu-ink.sis_research-93942024-01-09T03:55:59Z Examining the Inter-consistency of large language models: An in-depth analysis via debate XIONG, Kai DING, Xiao CAO, Yixin LIU, Ting QIN, Bing Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning, and introduce a formal debate framework (FORD) to conduct a three-stage debate among LLMs with real-world scenarios alignment: fair debate, mismatched debate, and roundtable debate. Through extensive experiments on various datasets, LLMs can effectively collaborate to reach a consensus despite noticeable inter-inconsistencies, but imbalances in their abilities can lead to domination by superior LLMs. Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost collaboration performance. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods. Codes and data are available at https://github.com/WasteWood/FORD. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8391 https://ink.library.smu.edu.sg/context/sis_research/article/9394/viewcontent/2305.11595.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Programming Languages and Compilers
spellingShingle Databases and Information Systems
Programming Languages and Compilers
XIONG, Kai
DING, Xiao
CAO, Yixin
LIU, Ting
QIN, Bing
Examining the Inter-consistency of large language models: An in-depth analysis via debate
description Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning, and introduce a formal debate framework (FORD) to conduct a three-stage debate among LLMs with real-world scenarios alignment: fair debate, mismatched debate, and roundtable debate. Through extensive experiments on various datasets, LLMs can effectively collaborate to reach a consensus despite noticeable inter-inconsistencies, but imbalances in their abilities can lead to domination by superior LLMs. Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost collaboration performance. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods. Codes and data are available at https://github.com/WasteWood/FORD.
format text
author XIONG, Kai
DING, Xiao
CAO, Yixin
LIU, Ting
QIN, Bing
author_facet XIONG, Kai
DING, Xiao
CAO, Yixin
LIU, Ting
QIN, Bing
author_sort XIONG, Kai
title Examining the Inter-consistency of large language models: An in-depth analysis via debate
title_short Examining the Inter-consistency of large language models: An in-depth analysis via debate
title_full Examining the Inter-consistency of large language models: An in-depth analysis via debate
title_fullStr Examining the Inter-consistency of large language models: An in-depth analysis via debate
title_full_unstemmed Examining the Inter-consistency of large language models: An in-depth analysis via debate
title_sort examining the inter-consistency of large language models: an in-depth analysis via debate
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8391
https://ink.library.smu.edu.sg/context/sis_research/article/9394/viewcontent/2305.11595.pdf
_version_ 1787590767633498112