LLM Eval - a tmarechaux Collection

tmarechaux 's Collections

LLMs

IR

LLM Eval

updated Jun 21, 2024

Levels of AGI: Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 37
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
A Survey on Evaluation of Large Language Models

Paper • 2307.03109 • Published Jul 6, 2023 • 42
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Paper • 2306.13651 • Published Jun 23, 2023 • 15
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 192
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7, 2024 • 39
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Paper • 2406.12045 • Published Jun 17, 2024 • 8