-
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 20 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Humans or LLMs as the Judge? A Study on Judgement Biases
Paper • 2402.10669 • Published -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31
Collections
Discover the best community collections!
Collections including paper arxiv:2310.17631
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 20 -
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Paper • 2303.16634 • Published • 1 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 48
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 23 -
On the Societal Impact of Open Foundation Models
Paper • 2403.07918 • Published • 16 -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Instruction Tuning for Large Language Models: A Survey
Paper • 2308.10792 • Published • 1
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 5 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 48
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 90 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 102
-
Large Language Model Alignment: A Survey
Paper • 2309.15025 • Published • 2 -
Aligning Large Language Models with Human: A Survey
Paper • 2307.12966 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 37 -
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Paper • 2310.05344 • Published • 1
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 31 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 48 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 90 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 17
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 26 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 19 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62
-
Attention Is All You Need
Paper • 1706.03762 • Published • 34 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 9
-
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Paper • 2303.16634 • Published • 1 -
miracl/miracl-corpus
Viewer • Updated • 5.43k • 39 -
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 20 -
How is ChatGPT's behavior changing over time?
Paper • 2307.09009 • Published • 22