Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2306.05685

Papers: Evaluation

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

Paper • 2310.17567 • Published Oct 26, 2023 • 1
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

Paper • 2310.15941 • Published Oct 24, 2023 • 6
Holistic Evaluation of Language Models

Paper • 2211.09110 • Published Nov 16, 2022 • 1
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Paper • 2306.04757 • Published Jun 7, 2023 • 4

Curated resources that support the use of LLMs to serve as automatic evaluators of other LLM outputs.

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 31
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 50
Generative Judge for Evaluating Alignment

Paper • 2310.05470 • Published Oct 9, 2023 • 1
Calibrating LLM-Based Evaluator

Paper • 2309.13308 • Published Sep 23, 2023 • 10

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 24
prometheus-eval/Feedback-Collection

Viewer • Updated Oct 14, 2023 • 100k • 266 • 94
prometheus-eval/prometheus-13b-v1.0

Text2Text Generation • Updated Oct 14, 2023 • 6.09k • 116
HuggingFaceH4/ultrafeedback_binarized

Viewer • Updated Jan 8 • 187k • 37.5k • 193

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs