Qwen1.5-0.5B-Chat with EPFL DPO fine-tuning
Qwen1.5-0.5B-Chat DPO fine-tuned on the Orca Math dataset that consists of ~200K grade school math word problems and open-ended and multiple choice questions from different EPFL courses.
Model Details
Model Description
The model was developed during the course Modern Natural Language Processing (CS-552). Its aim is to fine-tune the base model (Qwen/Qwen1.5-0.5B-Chat) to accurately answer open-ended and multiple-choice questions from Orca Math dataset and various EPFL courses.
- Developed by: Emma Lise Boehly, Ahmed Aziz Ben Haj Hmida and Jan Kokla
- Finetuned from model: Qwen/Qwen1.5-0.5B-Chat
Training Details
Training Data
HuggingFace dataset : microsoft/orca-math-word-problems-200k The EPFL dataset is not publicly available.
Training Procedure
Training Hyperparameters
Training regime: cDPO with bf16 mixed precision, $\beta=0.2$, $lr=3 \times 10^{-6}$, and $label_smoothing=0.2$
PEFT 0.10.0
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for emmabhl/Qwen1.5-0.5B-Chat-EPFL-ORCA-cDPO
Base model
Qwen/Qwen1.5-0.5B-Chat