-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 4 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 17 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2310.01377
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 21 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 45
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 11 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 21 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 41 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 19
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 10 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 4 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 80 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 96
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 2 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 10 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 10
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 3 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 27 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 2 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
-
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Paper • 2309.13356 • Published • 36 -
Unveiling Safety Vulnerabilities of Large Language Models
Paper • 2311.04124 • Published • 5 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62 -
Evaluating Frontier Models for Dangerous Capabilities
Paper • 2403.13793 • Published • 7