TRL

Callbacks

SyncRefModelCallback

( ref_model: Union accelerator: Optional )

( )

A TrainerCallback that displays the progress of training or evaluation using Rich.

( prompts: List judge: BaseRankJudge trainer: Trainer generation_config: Optional = None batch_size: int = 4 )

Parameters

prompts (List[str]) — The prompts to generate completions for.
judge (BaseRankJudge) — The judge to use for comparing completions.
trainer (Trainer) — The trainer.
generation_config (GenerationConfig, optional) — The generation config to use for generating completions.
batch_size (int, optional) — The batch size to use for generating completions. Defaults to 4.

A TrainerCallback that computes the win rate of a model based on a reference.

Usage:

trainer = DPOTrainer(...)
win_rate_callback = WinRateCallback(..., trainer=trainer)
trainer.add_callback(win_rate_callback)