# Trl ## Docs - [SDPO](https://huggingface.co/docs/trl/v1.7.0/sdpo_trainer.md) - [RLOO Trainer](https://huggingface.co/docs/trl/v1.7.0/rloo_trainer.md) - [GRPO With Replay Buffer](https://huggingface.co/docs/trl/v1.7.0/grpo_with_replay_buffer.md) - [GMPO](https://huggingface.co/docs/trl/v1.7.0/gmpo.md) - [PEFT Integration](https://huggingface.co/docs/trl/v1.7.0/peft_integration.md) - [CPO Trainer](https://huggingface.co/docs/trl/v1.7.0/cpo_trainer.md) - [A2PO](https://huggingface.co/docs/trl/v1.7.0/a2po_trainer.md) - [TRL - Transformers Reinforcement Learning](https://huggingface.co/docs/trl/v1.7.0/index.md) - [Post-Training Toolkit Integration](https://huggingface.co/docs/trl/v1.7.0/ptt_integration.md) - [DPO Trainer](https://huggingface.co/docs/trl/v1.7.0/dpo_trainer.md) - [Harbor Integration for Training LLMs with Environments](https://huggingface.co/docs/trl/v1.7.0/harbor.md) - [Usage Stats Collection](https://huggingface.co/docs/trl/v1.7.0/usage_stats.md) - [DeepSpeed Integration](https://huggingface.co/docs/trl/v1.7.0/deepspeed_integration.md) - [Data Utilities](https://huggingface.co/docs/trl/v1.7.0/data_utils.md) - [Reducing Memory Usage](https://huggingface.co/docs/trl/v1.7.0/reducing_memory_usage.md) - [Distillation Trainer](https://huggingface.co/docs/trl/v1.7.0/distillation_trainer.md) - [Chat template utilities](https://huggingface.co/docs/trl/v1.7.0/chat_template_utils.md) - [PRM Trainer](https://huggingface.co/docs/trl/v1.7.0/prm_trainer.md) - [GFPO](https://huggingface.co/docs/trl/v1.7.0/gfpo.md) - [Callbacks](https://huggingface.co/docs/trl/v1.7.0/callbacks.md) - [Training customization](https://huggingface.co/docs/trl/v1.7.0/customization.md) - [SSD](https://huggingface.co/docs/trl/v1.7.0/ssd_trainer.md) - [Nash-MD Trainer](https://huggingface.co/docs/trl/v1.7.0/nash_md_trainer.md) - [vLLM Integration](https://huggingface.co/docs/trl/v1.7.0/vllm_integration.md) - [OpenReward Integration for Training LLMs with Environments](https://huggingface.co/docs/trl/v1.7.0/openreward.md) - [MergeModelCallback[[trl.experimental.merge_model_callback.MergeModelCallback]]](https://huggingface.co/docs/trl/v1.7.0/merge_model_callback.md) - [GRPO Trainer](https://huggingface.co/docs/trl/v1.7.0/grpo_trainer.md) - [TPO Trainer](https://huggingface.co/docs/trl/v1.7.0/tpo_trainer.md) - [Dataset formats and types](https://huggingface.co/docs/trl/v1.7.0/dataset_formats.md) - [Asynchronous GRPO](https://huggingface.co/docs/trl/v1.7.0/async_grpo_trainer.md) - [PPO Trainer](https://huggingface.co/docs/trl/v1.7.0/ppo_trainer.md) - [Online DPO Trainer](https://huggingface.co/docs/trl/v1.7.0/online_dpo_trainer.md) - [Liger Kernel Integration](https://huggingface.co/docs/trl/v1.7.0/liger_kernel_integration.md) - [SDFT](https://huggingface.co/docs/trl/v1.7.0/sdft_trainer.md) - [Unsloth Integration](https://huggingface.co/docs/trl/v1.7.0/unsloth_integration.md) - [BEMA for Reference Model](https://huggingface.co/docs/trl/v1.7.0/bema_for_reference_model.md) - [SFT Trainer](https://huggingface.co/docs/trl/v1.7.0/sft_trainer.md) - [BCO Trainer](https://huggingface.co/docs/trl/v1.7.0/bco_trainer.md) - [PAPO Trainer](https://huggingface.co/docs/trl/v1.7.0/papo_trainer.md) - [MiniLLM Trainer](https://huggingface.co/docs/trl/v1.7.0/minillm_trainer.md) - [Kernels Hub Integration and Usage](https://huggingface.co/docs/trl/v1.7.0/kernels_hub.md) - [OpenEnv Integration for Training LLMs with Environments](https://huggingface.co/docs/trl/v1.7.0/openenv.md) - [Trackio Integration](https://huggingface.co/docs/trl/v1.7.0/trackio_integration.md) - [Community Tutorials](https://huggingface.co/docs/trl/v1.7.0/community_tutorials.md) - [Scripts Utilities](https://huggingface.co/docs/trl/v1.7.0/script_utils.md) - [Paper Index](https://huggingface.co/docs/trl/v1.7.0/paper_index.md) - [Training with Jobs](https://huggingface.co/docs/trl/v1.7.0/jobs_training.md) - [Reward Modeling](https://huggingface.co/docs/trl/v1.7.0/reward_trainer.md) - [ORPO Trainer](https://huggingface.co/docs/trl/v1.7.0/orpo_trainer.md) - [Examples](https://huggingface.co/docs/trl/v1.7.0/example_overview.md) - [GSPO-token](https://huggingface.co/docs/trl/v1.7.0/gspo_token.md) - [General Online Logit Distillation (GOLD) Trainer](https://huggingface.co/docs/trl/v1.7.0/gold_trainer.md) - [Quickstart](https://huggingface.co/docs/trl/v1.7.0/quickstart.md) - [Experimental](https://huggingface.co/docs/trl/v1.7.0/experimental_overview.md) - [Generalized Knowledge Distillation Trainer](https://huggingface.co/docs/trl/v1.7.0/gkd_trainer.md) - [Installation](https://huggingface.co/docs/trl/v1.7.0/installation.md) - [Reward Functions](https://huggingface.co/docs/trl/v1.7.0/rewards.md) - [Command Line Interfaces (CLIs)](https://huggingface.co/docs/trl/v1.7.0/clis.md) - [Speeding Up Training](https://huggingface.co/docs/trl/v1.7.0/speeding_up_training.md) - [RapidFire AI Integration](https://huggingface.co/docs/trl/v1.7.0/rapidfire_integration.md) - [Chat Templates](https://huggingface.co/docs/trl/v1.7.0/chat_templates.md) - [Use model after training](https://huggingface.co/docs/trl/v1.7.0/use_model.md) - [KTO Trainer](https://huggingface.co/docs/trl/v1.7.0/kto_trainer.md) - [LoRA Without Regret](https://huggingface.co/docs/trl/v1.7.0/lora_without_regret.md) - [XPO Trainer](https://huggingface.co/docs/trl/v1.7.0/xpo_trainer.md) - [Distributing Training](https://huggingface.co/docs/trl/v1.7.0/distributing_training.md)