A comprehensive evaluation of large language models on legal judgment prediction
Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain. However, recent disputes over GPT-4’s law evaluation raise questions concerning their performance in real-world legal tasks. To systematically investigate their competency in the...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8396 https://ink.library.smu.edu.sg/context/sis_research/article/9399/viewcontent/2310.11761.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9399 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-93992024-01-09T03:52:26Z A comprehensive evaluation of large language models on legal judgment prediction SHUI, Ruihao CAO, Yixin WANG, Xiang CHUA, Tat-Seng Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain. However, recent disputes over GPT-4’s law evaluation raise questions concerning their performance in real-world legal tasks. To systematically investigate their competency in the law, we design practical baseline solutions based on LLMs and test on the task of legal judgment prediction. In our solutions, LLMs can work alone to answer open questions or coordinate with an information retrieval (IR) system to learn from similar cases or solve simplified multi-choice questions. We show that similar cases and multi-choice options, namely label candidates, included in prompts can help LLMs recall domain knowledge that is critical for expertise legal reasoning. We additionally present an intriguing paradox wherein an IR system surpasses the performance of LLM+IR due to limited gains acquired by weaker LLMs from powerful IR systems. In such cases, the role of LLMs becomes redundant. Our evaluation pipeline can be easily extended into other tasks to facilitate evaluations in other domains. Code is available at https://github.com/srhthu/ LM-CompEval-Legal 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8396 info:doi/10.18653/v1/2023.findings-emnlp.490 https://ink.library.smu.edu.sg/context/sis_research/article/9399/viewcontent/2310.11761.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Programming Languages and Compilers |
spellingShingle |
Databases and Information Systems Programming Languages and Compilers SHUI, Ruihao CAO, Yixin WANG, Xiang CHUA, Tat-Seng A comprehensive evaluation of large language models on legal judgment prediction |
description |
Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain. However, recent disputes over GPT-4’s law evaluation raise questions concerning their performance in real-world legal tasks. To systematically investigate their competency in the law, we design practical baseline solutions based on LLMs and test on the task of legal judgment prediction. In our solutions, LLMs can work alone to answer open questions or coordinate with an information retrieval (IR) system to learn from similar cases or solve simplified multi-choice questions. We show that similar cases and multi-choice options, namely label candidates, included in prompts can help LLMs recall domain knowledge that is critical for expertise legal reasoning. We additionally present an intriguing paradox wherein an IR system surpasses the performance of LLM+IR due to limited gains acquired by weaker LLMs from powerful IR systems. In such cases, the role of LLMs becomes redundant. Our evaluation pipeline can be easily extended into other tasks to facilitate evaluations in other domains. Code is available at https://github.com/srhthu/ LM-CompEval-Legal |
format |
text |
author |
SHUI, Ruihao CAO, Yixin WANG, Xiang CHUA, Tat-Seng |
author_facet |
SHUI, Ruihao CAO, Yixin WANG, Xiang CHUA, Tat-Seng |
author_sort |
SHUI, Ruihao |
title |
A comprehensive evaluation of large language models on legal judgment prediction |
title_short |
A comprehensive evaluation of large language models on legal judgment prediction |
title_full |
A comprehensive evaluation of large language models on legal judgment prediction |
title_fullStr |
A comprehensive evaluation of large language models on legal judgment prediction |
title_full_unstemmed |
A comprehensive evaluation of large language models on legal judgment prediction |
title_sort |
comprehensive evaluation of large language models on legal judgment prediction |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2023 |
url |
https://ink.library.smu.edu.sg/sis_research/8396 https://ink.library.smu.edu.sg/context/sis_research/article/9399/viewcontent/2310.11761.pdf |
_version_ |
1787590768493330432 |