Edit model card

mistral-llm-recipes-en-ja-continuous-pretrained-v1-dev-finetune-docs-dpo-lora-debug

This model is a fine-tuned version of skim-wmt24/mistral-llm-recipes-en-ja-continuous-pretrained-v1-dev-finetune-docs-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: 2.2801
  • Rewards/rejected: -39.1533
  • Rewards/accuracies: 1.0
  • Rewards/margins: 41.4334
  • Logps/rejected: -445.0770
  • Logps/chosen: -359.7921
  • Logits/rejected: -1.3840
  • Logits/chosen: -0.5887

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 0.2107 100 0.0000 2.3145 -32.8393 1.0 35.1539 -381.9371 -359.4480 -1.3607 -0.5569
0.0 0.4215 200 0.0000 2.3217 -36.0030 1.0 38.3247 -413.5737 -359.3759 -1.3774 -0.5735
0.0 0.6322 300 0.0000 2.2793 -38.2847 1.0 40.5641 -436.3912 -359.8001 -1.3815 -0.5881
0.0 0.8430 400 0.0000 2.2847 -38.6597 1.0 40.9444 -440.1409 -359.7462 -1.3831 -0.5885
0.0 1.0537 500 0.0000 2.2864 -38.7575 1.0 41.0438 -441.1187 -359.7298 -1.3832 -0.5886
0.0 1.2645 600 0.0000 2.2828 -38.9276 1.0 41.2104 -442.8197 -359.7650 -1.3838 -0.5891
0.0 1.4752 700 0.0000 2.2807 -38.9749 1.0 41.2556 -443.2929 -359.7865 -1.3838 -0.5890
0.0 1.6860 800 0.0000 2.2817 -39.1524 1.0 41.4341 -445.0674 -359.7761 -1.3833 -0.5885
0.0 1.8967 900 0.0000 2.2829 -39.1605 1.0 41.4434 -445.1483 -359.7638 -1.3833 -0.5887
0.0 2.1075 1000 0.0000 2.2801 -39.1533 1.0 41.4334 -445.0770 -359.7921 -1.3840 -0.5887

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.0a0+ebedce2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for shirayukikun/mistral-llm-recipes-en-ja-continuous-pretrained-v1-dev-finetune-docs-dpo-lora-debug