-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 49 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Calibrating LLM-Based Evaluator
Paper • 2309.13308 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2305.13711
-
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 21 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Humans or LLMs as the Judge? A Study on Judgement Biases
Paper • 2402.10669 • Published -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 21 -
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Paper • 2303.16634 • Published • 1 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 49
-
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Paper • 2208.05309 • Published • 1 -
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Paper • 2305.13711 • Published • 2 -
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Paper • 2302.09664 • Published • 2 -
BARTScore: Evaluating Generated Text as Text Generation
Paper • 2106.11520 • Published • 1
-
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Paper • 2303.16634 • Published • 1 -
miracl/miracl-corpus
Viewer • Updated • 6.3k • 39 -
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 21 -
How is ChatGPT's behavior changing over time?
Paper • 2307.09009 • Published • 22
-
yentinglin/Taiwan-LLM-8x7B-DPO
Text Generation • Updated • 2.46k • 16 -
yentinglin/Taiwan-LLM-13B-v2.0-chat
Text Generation • Updated • 864 • 46 -
Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model
Paper • 2311.17487 • Published • 2 -
yentinglin/Taiwan-LLM-13B-v2.0-chat-awq
Text Generation • Updated • 48 • 3