Edit model card

0425

This model is a fine-tuned version of Qwen/Qwen1.5-7B on the alpaca_formatted_ift_eft_Justification dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8213

Model description

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:

  • 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
  • Significant performance improvement in Chat models;
  • Multilingual support of both base and chat models;
  • Stable support of 32K context length for models of all sizes
  • No need of trust_remote_code.

For more details, please refer to the blog post and GitHub repo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 12
  • total_eval_batch_size: 3
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 5.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.0669 0.2018 100 0.8823
0.9156 0.4036 200 0.8593
0.9509 0.6054 300 0.8491
0.8287 0.8073 400 0.8423
0.8772 1.0091 500 0.8390
0.9101 1.2109 600 0.8385
0.8212 1.4127 700 0.8342
0.8721 1.6145 800 0.8327
1.0033 1.8163 900 0.8319
0.9879 2.0182 1000 0.8276
0.964 2.2200 1100 0.8276
0.8409 2.4218 1200 0.8264
0.8055 2.6236 1300 0.8262
1.0026 2.8254 1400 0.8240
0.881 3.0272 1500 0.8241
1.0058 3.2291 1600 0.8226
0.8747 3.4309 1700 0.8205
0.8686 3.6327 1800 0.8215
0.8838 3.8345 1900 0.8208
0.8246 4.0363 2000 0.8218
0.8727 4.2381 2100 0.8216
0.8737 4.4400 2200 0.8214
0.8955 4.6418 2300 0.8214
0.8909 4.8436 2400 0.8215

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model's library. Check the docs .