Safetensors
dpo
vietnamese
alignment
lora

lab22-dpo-vn

DPO-aligned Vietnamese LLM · VinUni AICB Program · Lab 22.

Item Value
Base model {BASE_MODEL}
SFT data 5CD-AI/Vietnamese-alpaca-cleaned (1k samples, 1 epoch)
DPO data argilla/ultrafeedback-binarized-preferences-cleaned (2k pairs)
beta {BETA}
lr {LR}
LoRA r 16 (alpha=32)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nguyen11/lab22-dpo-vn

Base model

Qwen/Qwen2.5-3B
Adapter
(46)
this model

Dataset used to train Nguyen11/lab22-dpo-vn