WDong's picture
Update README.md
8af1923 verified
|
raw
history blame
2.79 kB
metadata
license: mit

0506_7_7

This model is a fine-tuned version of ../../models/Qwen1.5-7B-sft-0502 on the alpaca_formatted_review_new_data_0505_greater_7 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7221

Model description

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:

  • 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
  • Significant performance improvement in Chat models;
  • Multilingual support of both base and chat models;
  • Stable support of 32K context length for models of all sizes
  • No need of trust_remote_code.

For more details, please refer to the blog post and GitHub repo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 5.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.7981 0.2768 20 0.6501
0.7391 0.5536 40 0.6358
0.744 0.8304 60 0.6277
0.6284 1.1073 80 0.6241
0.7339 1.3841 100 0.6303
0.8346 1.6609 120 0.6408
0.6927 1.9377 140 0.6391
0.4915 2.2145 160 0.6543
0.7845 2.4913 180 0.6596
0.6619 2.7682 200 0.6587
0.4897 3.0450 220 0.6679
0.5064 3.3218 240 0.6951
0.6467 3.5986 260 0.6997
0.6615 3.8754 280 0.6985
0.4954 4.1522 300 0.7111
0.5624 4.4291 320 0.7216
0.5554 4.7059 340 0.7218
0.6798 4.9827 360 0.7221

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.19.1