Benchmarking foundation models with language-model-as-an-examiner
Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model’s ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets,...
Saved in:
Main Authors: | BAI, Yushi, YING, Jiahao, CAO, Yixin, LV, Xin, HE, Yuze, WANG, Xiaozhi, YU, Jifan, ZENG, Kaisheng, XIAO, Yijia, LYU, Haozhe, ZHANG, Jiayin, LI, Juanzi, HOU, Lei |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8392 https://ink.library.smu.edu.sg/context/sis_research/article/9395/viewcontent/2306.04181.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Examining the Inter-consistency of large language models: An in-depth analysis via debate
by: XIONG, Kai, et al.
Published: (2023) -
CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models
by: ZHANG, Tong, et al.
Published: (2024) -
Early rumor detection using neural Hawkes process with a new benchmark dataset
by: ZENG, Fengzhu, et al.
Published: (2022) -
A comprehensive evaluation of large language models on legal judgment prediction
by: SHUI, Ruihao, et al.
Published: (2023) -
Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
by: LV, Xin, et al.
Published: (2021)