Benchmarking foundation models with language-model-as-an-examiner

Benchmarking foundation models with language-model-as-an-examiner

Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model’s ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets,...

Full description

Saved in:

Bibliographic Details
Main Authors:	BAI, Yushi, YING, Jiahao, CAO, Yixin, LV, Xin, HE, Yuze, WANG, Xiaozhi, YU, Jifan, ZENG, Kaisheng, XIAO, Yijia, LYU, Haozhe, ZHANG, Jiayin, LI, Juanzi, HOU, Lei
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Databases and Information Systems Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/8392 https://ink.library.smu.edu.sg/context/sis_research/article/9395/viewcontent/2306.04181.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Examining the Inter-consistency of large language models: An in-depth analysis via debate
by: XIONG, Kai, et al.
Published: (2023)

CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models
by: ZHANG, Tong, et al.
Published: (2024)

Early rumor detection using neural Hawkes process with a new benchmark dataset
by: ZENG, Fengzhu, et al.
Published: (2022)

A comprehensive evaluation of large language models on legal judgment prediction
by: SHUI, Ruihao, et al.
Published: (2023)

Is multi-hop reasoning really explainable? Towards benchmarking reasoning interpretability
by: LV, Xin, et al.
Published: (2021)

Attack prompt generation for red teaming and defending large language models
by: DENG, Boyi, et al.
Published: (2023)

Are missing links predictable? An inferential benchmark for knowledge graph completion
by: CAO, Yixin, et al.
Published: (2021)

MolCA: Molecular graph-language modeling with cross-modal projector and uni-modal adapter
by: LIU, Zhiyuan, et al.
Published: (2023)

Model checking approach to automated planning
by: LI, Yi, et al.
Published: (2014)

Disentangling transformer language models as superposed topic models
by: LIM, Jia Peng, et al.
Published: (2023)

INFAR: insight extraction from app reviews
by: GAO, Cuiyun, et al.
Published: (2018)

Towards expressive specification and efficient model checking
by: DONG, Jin Song, et al.
Published: (2009)

Integrating specification and programs for system modeling and verification
by: SUN, Jun, et al.
Published: (2009)

An interference-free programming model for network objects
by: SCHILL, Mischael, et al.
Published: (2016)

Q-bench: a benchmark for general-purpose foundation models on low-level vision
by: Wu, Haoning, et al.
Published: (2024)

Large language model is not a good few-shot information extractor, but a good reranker for hard samples!
by: MA, Yubo, et al.
Published: (2023)

On the influence of biases in bug localization: evaluation and benchmark
by: WIDYASARI, Ratnadira, et al.
Published: (2022)

A scalable approach to multi-style architectural modeling and verification
by: WONG, Stephen, et al.
Published: (2008)

Elevating automated software maintenance tasks with large language models
by: ZHOU, Xin
Published: (2024)

A formal model of semantic Web Service Ontology (WSMO) execution
by: WANG, Hai H., et al.
Published: (2008)

Large language model for vulnerability detection: Emerging results and future directions
by: ZHOU, Xin, et al.
Published: (2024)

Towards robust, secure, and privacy-aware large language models of code
by: YANG, Zhou
Published: (2024)

Unified modeling language: A complexity analysis
by: SIAU, Keng, et al.
Published: (2001)

A BERT-based dual embedding model for Chinese idiom prediction
by: TAN, Minghuan, et al.
Published: (2020)

Large language models as source planner for personalized knowledge-grounded dialogues
by: WANG, Hongru, et al.
Published: (2023)

ReEvo: Large language models as hyper-heuristics with reflective evolution
by: YE, Haoran, et al.
Published: (2024)

Plug-and-play policy planner for large language model powered dialogue agents
by: DENG, Yang, et al.
Published: (2024)

Reinforcement tuning for detecting stances and debunking rumors jointly with large language models
by: YANG, Ruichao, et al.
Published: (2024)

VLStereoSet: A study of stereotypical bias in pre-trained vision-language models
by: ZHOU, Kankan, et al.
Published: (2022)

Can syntax help? Improving an LSTM-based Sentence Compression Model for New Domains
by: WANG, Liangguo, et al.
Published: (2017)

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models
by: HU, Zhiqiang, et al.
Published: (2023)

A black-box attack on code models via representation nearest Neighbor search
by: ZHANG, Jie, et al.
Published: (2023)

Self-chats from large language models make small emotional support chatbot better
by: ZHENG, Zhonghua, et al.
Published: (2024)

Towards using concurrent Java API correctly
by: LIU, Shuang, et al.
Published: (2016)

STYLE: Improving domain transferability of asking clarification questions in large language model powered conversational agents
by: CHEN, Yue, et al.
Published: (2024)

Multi-head attention graph convolutional network model: End-to-end entity and relation joint extraction based on multi-head attention graph convolutional network
by: TAO, Zhihua, et al.
Published: (2023)

Interactive contrastive learning for self-supervised entity alignment
by: ZENG, Kaisheng, et al.
Published: (2022)

Laughter emotion recognition using gestures
by: De Jesus, Paulina Catya S.
Published: (2014)

Sound and complete certificates for quantitative termination analysis of probabilistic programs
by: CHATTERJEE, Krishnendu, et al.
Published: (2022)

Learning control policies for stochastic systems with reach-avoid guarantees
by: ZIKELIC, Dorde, et al.
Published: (2023)