Edit model card

0502

This model is a fine-tuned version of /datas/huggingface/Qwen1.5-7B on the alpaca_formatted_ift_eft_dft_rft_2048 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8510

Model description

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:

  • 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
  • Significant performance improvement in Chat models;
  • Multilingual support of both base and chat models;
  • Stable support of 32K context length for models of all sizes
  • No need of trust_remote_code.

For more details, please refer to the blog post and GitHub repo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 5.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.1252 0.2335 200 1.0653
1.0075 0.4670 400 0.9458
1.2782 0.7005 600 0.9099
0.8558 0.9340 800 0.8929
0.922 1.1675 1000 0.8817
0.8985 1.4011 1200 0.8758
0.8273 1.6346 1400 0.8700
0.9136 1.8681 1600 0.8655
0.9963 2.1016 1800 0.8614
1.0214 2.3351 2000 0.8597
0.8823 2.5686 2200 0.8569
0.9265 2.8021 2400 0.8557
0.8033 3.0356 2600 0.8541
0.992 3.2691 2800 0.8527
0.7903 3.5026 3000 0.8522
0.8686 3.7361 3200 0.8518
0.954 3.9696 3400 0.8515
0.6472 4.2032 3600 0.8513
0.8799 4.4367 3800 0.8510
0.9454 4.6702 4000 0.8510
0.9496 4.9037 4200 0.8510

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.19.1
Downloads last month
1,849
Safetensors
Model size
7.72B params
Tensor type
BF16
·