S4nto/lora-dpo-finetuned-model-beta-0.1-rate-1e5-stage2-iter40000-sft Text Generation • Updated May 16 • 4