-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 22 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 53
Vignesh
Vigneshwaran
AI & ML interests
None yet
Recent Activity
updated
a collection
4 days ago
evaluation
updated
a collection
6 days ago
evaluation
updated
a collection
7 days ago
RL
Organizations
Collections
5
models
None public yet
datasets
None public yet