Edit model card

06051615

This model is a fine-tuned version of Qwen/Qwen1.5-7B-Chat on the my own dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9018

Model description

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:

  • 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
  • Significant performance improvement in Chat models;
  • Multilingual support of both base and chat models;
  • Stable support of 32K context length for models of all sizes
  • No need of trust_remote_code. For more details, please refer to the blog post and GitHub repo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 700
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.7655 0.4793 700 0.9256
0.8703 0.9586 1400 0.9017
0.725 1.4379 2100 0.9006
0.7958 1.9172 2800 0.8908
0.7346 2.3964 3500 0.8911
0.6516 2.8757 4200 0.8911
1.0524 3.3550 4900 0.9006
1.1005 3.8343 5600 0.8945
0.7991 4.3136 6300 0.9009
0.7668 4.7929 7000 0.9016

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for