Beyond factuality: A comprehensive evaluation of large language models as knowledge generators

Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge. Yet, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this,...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Liang, DENG, Yang, BIAN, Yatao, QIN, Zeyu, WU, Bingzhe, CHUA, Tat-Seng, WONG, Kam-Fai
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Comprehensive evaluation Down-stream Empirical analysis Evaluation framework Informativeness Knowledge evaluations Knowledge intensive tasks Language model Retrieval techniques World knowledge Databases and Information Systems Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/9117 https://ink.library.smu.edu.sg/context/sis_research/article/10120/viewcontent/Beyond.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10120
record_format	dspace
spelling	sg-smu-ink.sis_research-101202024-08-01T14:39:27Z Beyond factuality: A comprehensive evaluation of large language models as knowledge generators CHEN, Liang DENG, Yang BIAN, Yatao QIN, Zeyu WU, Bingzhe CHUA, Tat-Seng WONG, Kam-Fai Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge. Yet, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this, we introduce CONNER, a COmpreheNsive kNowledge Evaluation fRamework, designed to systematically and automatically evaluate generated knowledge from six important perspectives - Factuality, Relevance, Coherence, Informativeness, Helpfulness and Validity. We conduct an extensive empirical analysis of the generated knowledge from three different types of LLMs on two widely-studied knowledge-intensive tasks, i.e., open-domain question answering and knowledge-grounded dialogue. Surprisingly, our study reveals that the factuality of generated knowledge, even if lower, does not significantly hinder downstream tasks. Instead, the relevance and coherence of the outputs are more important than small factual mistakes. Further, we show how to use CONNER to improve knowledge-intensive tasks by designing two strategies: Prompt Engineering and Knowledge Selection. Our evaluation code and LLM-generated knowledge with human annotations will be released to facilitate future research. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9117 info:doi/10.18653/v1/2023.emnlp-main.390 https://ink.library.smu.edu.sg/context/sis_research/article/10120/viewcontent/Beyond.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Comprehensive evaluation Down-stream Empirical analysis Evaluation framework Informativeness Knowledge evaluations Knowledge intensive tasks Language model Retrieval techniques World knowledge Databases and Information Systems Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Comprehensive evaluation Down-stream Empirical analysis Evaluation framework Informativeness Knowledge evaluations Knowledge intensive tasks Language model Retrieval techniques World knowledge Databases and Information Systems Information Security
spellingShingle	Comprehensive evaluation Down-stream Empirical analysis Evaluation framework Informativeness Knowledge evaluations Knowledge intensive tasks Language model Retrieval techniques World knowledge Databases and Information Systems Information Security CHEN, Liang DENG, Yang BIAN, Yatao QIN, Zeyu WU, Bingzhe CHUA, Tat-Seng WONG, Kam-Fai Beyond factuality: A comprehensive evaluation of large language models as knowledge generators
description	Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge. Yet, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this, we introduce CONNER, a COmpreheNsive kNowledge Evaluation fRamework, designed to systematically and automatically evaluate generated knowledge from six important perspectives - Factuality, Relevance, Coherence, Informativeness, Helpfulness and Validity. We conduct an extensive empirical analysis of the generated knowledge from three different types of LLMs on two widely-studied knowledge-intensive tasks, i.e., open-domain question answering and knowledge-grounded dialogue. Surprisingly, our study reveals that the factuality of generated knowledge, even if lower, does not significantly hinder downstream tasks. Instead, the relevance and coherence of the outputs are more important than small factual mistakes. Further, we show how to use CONNER to improve knowledge-intensive tasks by designing two strategies: Prompt Engineering and Knowledge Selection. Our evaluation code and LLM-generated knowledge with human annotations will be released to facilitate future research.
format	text
author	CHEN, Liang DENG, Yang BIAN, Yatao QIN, Zeyu WU, Bingzhe CHUA, Tat-Seng WONG, Kam-Fai
author_facet	CHEN, Liang DENG, Yang BIAN, Yatao QIN, Zeyu WU, Bingzhe CHUA, Tat-Seng WONG, Kam-Fai
author_sort	CHEN, Liang
title	Beyond factuality: A comprehensive evaluation of large language models as knowledge generators
title_short	Beyond factuality: A comprehensive evaluation of large language models as knowledge generators
title_full	Beyond factuality: A comprehensive evaluation of large language models as knowledge generators
title_fullStr	Beyond factuality: A comprehensive evaluation of large language models as knowledge generators
title_full_unstemmed	Beyond factuality: A comprehensive evaluation of large language models as knowledge generators
title_sort	beyond factuality: a comprehensive evaluation of large language models as knowledge generators
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/9117 https://ink.library.smu.edu.sg/context/sis_research/article/10120/viewcontent/Beyond.pdf
_version_	1814047746742550528

Beyond factuality: A comprehensive evaluation of large language models as knowledge generators

Similar Items