Collections
Discover the best community collections!
Collections trending this week
-
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Paper • 2310.17567 • Published • 1 -
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models
Paper • 2310.15941 • Published • 6 -
Holistic Evaluation of Language Models
Paper • 2211.09110 • Published • 1 -
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Paper • 2306.04757 • Published • 5