Automating dataset updates towards reliable and timely evaluation of Large Language Models

Automating dataset updates towards reliable and timely evaluation of Large Language Models

Large language models (LLMs) have achieved impressive performance across various natural language benchmarks, prompting a continual need to curate more difficult datasets for larger LLMs, which is costly and time-consuming. In this paper, we propose to automate dataset updating and provide systemati...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	YING, Jiahao, CAO, Yixin, BAI, Yushi, SUN, Qianru, WANG, Bo, TANG, Wei, DING, Zhaojun, YANG, Yizhe, HUANG, Xuanjing, YAN, Shuicheng
التنسيق:	text
اللغة:	English
منشور في:	Institutional Knowledge at Singapore Management University 2024
الموضوعات:	Large language models LLM Dataset update Benchmark update Automation Artificial Intelligence and Robotics
الوصول للمادة أونلاين:	https://ink.library.smu.edu.sg/sis_research/9439
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Singapore Management University
اللغة:	English

مواد مشابهة

Eigenvalues and switching algorithms for Quasi-Newton updates
بواسطة: Phua, P.K.H.
منشور في: (2014)

LLMs-as-instructors : Learning from errors toward automating model improvement
بواسطة: YING, Jiahao, وآخرون
منشور في: (2024)

We challenge you to certify your updates
بواسطة: Chen, S., وآخرون
منشور في: (2013)

View update in entity-relationship approach
بواسطة: Ling, T.W., وآخرون
منشور في: (2014)

Combating obsolescence: Predictors of technical updating among engineers
بواسطة: Aryee, S.
منشور في: (2013)

Querying and Updating XML Data based on Node Labeling Schemes
بواسطة: LI CHANGQING
منشور في: (2010)

QED: A novel quaternary encoding to completely avoid re-labeling in XML updates
بواسطة: Li, C., وآخرون
منشور في: (2013)

Efficient location updates for continuous queries over moving objects
بواسطة: Hsueh, Y.-L., وآخرون
منشور في: (2013)

Update on influenza vaccines
بواسطة: Tambyah, P.A.
منشور في: (2011)

Efficient processing of XML documents
بواسطة: WANG WEN QIANG
منشور في: (2010)

An adaptive updating protocol for reducing moving object database workload
بواسطة: Chen, S., وآخرون
منشور في: (2013)

Automatic Android deprecated-API usage update by learning from single updated example
بواسطة: HARYONO, Stefanus A., وآخرون
منشور في: (2020)

Update on influenza anti-virals
بواسطة: Tambyah, P.A.
منشور في: (2011)

Is the ground truth really accurate? Dataset purification for automated program repair
بواسطة: YANG, Deheng, وآخرون
منشور في: (2021)

Low rank update of singular values
بواسطة: Chu, D., وآخرون
منشور في: (2014)

Reliability analysis and optimal version-updating for open source software
بواسطة: Li, X., وآخرون
منشور في: (2014)

AndroEvolve: Automated update for Android deprecated-API usages
بواسطة: HARYONO, Stefanus A., وآخرون
منشور في: (2021)

Risk analysis of commitment-option contracts with forecast updates
بواسطة: Buzacott, J., وآخرون
منشور في: (2013)

A protocol for micro mobility management in next generation IPv6 networks
بواسطة: Sharma, A., وآخرون
منشور في: (2013)

The wavelet transform-domain LMS adaptive filter with partial subband-coefficient updating
بواسطة: Attallah, S.
منشور في: (2014)

AndroEvolve: Automated Android API update with data flow analysis and variable denormalization
بواسطة: HARYONO, Stefanus A., وآخرون
منشور في: (2022)

Real-time data-processing framework with model updating for digital twins of water treatment facilities
بواسطة: Wei, Yuying, وآخرون
منشور في: (2023)

Pattern space maintenance for data updates and interactive mining
بواسطة: Feng, M., وآخرون
منشور في: (2013)

Efficient encrypted data search with expressive queries and flexible update
بواسطة: NING, Jianting, وآخرون
منشور في: (2022)

Eigenspace updating for non-stationary process and its application to face recognition
بواسطة: Liu X., وآخرون
منشور في: (2018)

Contract law
بواسطة: GOH, Yihan, وآخرون
منشور في: (2018)

RISKY INVESTMENTS UNDER STATIC AND DYNAMIC INFORMATION ACQUISITION
بواسطة: TAN HONG MING
منشور في: (2021)

Image Denoising Via L1 Norm Regularization Over Adaptive Dictionary
بواسطة: HUANG XINHAI
منشور في: (2012)

DDE: From dewey to a fully dynamic XML labeling scheme
بواسطة: Xu, L., وآخرون
منشور في: (2013)

THE OPTIMAL OFFER DEADLINE WHEN FACING A BAYESIAN-LEARNING SEARCHER
بواسطة: HUANG YUNTAO
منشور في: (2020)

Just-In-Time obsolete comment detection and update
بواسطة: LIU, Zhongxin, وآخرون
منشور في: (2023)

Update recovery attacks on encrypted database within two updates using range queries leakage
بواسطة: NING, Jianting, وآخرون
منشور في: (2022)

Online parameter estimation and compensation of preisach hysteresis by SVD updating
بواسطة: Lei, L., وآخرون
منشور في: (2014)

Framework to evaluate and test defences against hallucination in large language model
بواسطة: Pan, Johnny Shi Han
منشور في: (2024)

Demystifying faulty code: Step-by-step reasoning for explainable fault localization
بواسطة: WIDYASARI, Ratnadira, وآخرون
منشور في: (2024)

Hardware-assisted live kernel function updating on Intel platforms
بواسطة: ZHOU, Lei, وآخرون
منشور في: (2024)

Location Update versus Paging Trade-Off in Cellular Networks: An Approach Based on Vector Quantization
بواسطة: ROY, Abhishek, وآخرون
منشور في: (2007)

Reconfigurable dielectric engineered WSe2/HZO mem-transistor
بواسطة: Tong, Tong, وآخرون
منشور في: (2024)

AUM - An IPv6 based approach for micromobility
بواسطة: Sharma, A., وآخرون
منشور في: (2013)

Labeling dynamic XML documents: An order-centric approach
بواسطة: XU LIANG
منشور في: (2011)