mistral-llm-recipes-en-ja-continuous-pretrained-v1-dev-finetune-docs-dpo-lora-debug
This model is a fine-tuned version of skim-wmt24/mistral-llm-recipes-en-ja-continuous-pretrained-v1-dev-finetune-docs-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 2.2801
- Rewards/rejected: -39.1533
- Rewards/accuracies: 1.0
- Rewards/margins: 41.4334
- Logps/rejected: -445.0770
- Logps/chosen: -359.7921
- Logits/rejected: -1.3840
- Logits/chosen: -0.5887
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | 0.2107 | 100 | 0.0000 | 2.3145 | -32.8393 | 1.0 | 35.1539 | -381.9371 | -359.4480 | -1.3607 | -0.5569 |
0.0 | 0.4215 | 200 | 0.0000 | 2.3217 | -36.0030 | 1.0 | 38.3247 | -413.5737 | -359.3759 | -1.3774 | -0.5735 |
0.0 | 0.6322 | 300 | 0.0000 | 2.2793 | -38.2847 | 1.0 | 40.5641 | -436.3912 | -359.8001 | -1.3815 | -0.5881 |
0.0 | 0.8430 | 400 | 0.0000 | 2.2847 | -38.6597 | 1.0 | 40.9444 | -440.1409 | -359.7462 | -1.3831 | -0.5885 |
0.0 | 1.0537 | 500 | 0.0000 | 2.2864 | -38.7575 | 1.0 | 41.0438 | -441.1187 | -359.7298 | -1.3832 | -0.5886 |
0.0 | 1.2645 | 600 | 0.0000 | 2.2828 | -38.9276 | 1.0 | 41.2104 | -442.8197 | -359.7650 | -1.3838 | -0.5891 |
0.0 | 1.4752 | 700 | 0.0000 | 2.2807 | -38.9749 | 1.0 | 41.2556 | -443.2929 | -359.7865 | -1.3838 | -0.5890 |
0.0 | 1.6860 | 800 | 0.0000 | 2.2817 | -39.1524 | 1.0 | 41.4341 | -445.0674 | -359.7761 | -1.3833 | -0.5885 |
0.0 | 1.8967 | 900 | 0.0000 | 2.2829 | -39.1605 | 1.0 | 41.4434 | -445.1483 | -359.7638 | -1.3833 | -0.5887 |
0.0 | 2.1075 | 1000 | 0.0000 | 2.2801 | -39.1533 | 1.0 | 41.4334 | -445.0770 | -359.7921 | -1.3840 | -0.5887 |
Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0a0+ebedce2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3