lab22-dpo-vn
DPO-aligned Vietnamese LLM · VinUni AICB Program · Lab 22.
| Item |
Value |
| Base model |
{BASE_MODEL} |
| SFT data |
5CD-AI/Vietnamese-alpaca-cleaned (1k samples, 1 epoch) |
| DPO data |
argilla/ultrafeedback-binarized-preferences-cleaned (2k pairs) |
| beta |
{BETA} |
| lr |
{LR} |
| LoRA r |
16 (alpha=32) |