TRL - Transformer Reinforcement Learning

TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.

Below is the current list of TRL trainers, organized by method type (⚡️ = vLLM support).

Taxonomy

Online methods

Reward modeling

Offline methods

Knowledge distillation

GKDTrainer

🎉 What’s New

✨ OpenAI GPT OSS Support: TRL now fully supports fine-tuning the latest OpenAI GPT OSS models! Check out the:

You can also explore TRL-related models, datasets, and demos in the TRL Hugging Face organization.

Learn

Learn post-training with TRL and other libraries in 🤗 smol course.

The documentation is organized into the following sections:

Getting Started: installation and quickstart guide.
Conceptual Guides: dataset formats, training FAQ, and understanding logs.
How-to Guides: reducing memory usage, speeding up training, distributing training, etc.
Integrations: DeepSpeed, Liger Kernel, PEFT, etc.
Examples: example overview, community tutorials, etc.
API: trainers, utils, etc.