0502
This model is a fine-tuned version of /datas/huggingface/Qwen1.5-7B on the alpaca_formatted_ift_eft_dft_rft_2048 dataset. It achieves the following results on the evaluation set:
- Loss: 0.8510
Model description
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:
- 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
- Significant performance improvement in Chat models;
- Multilingual support of both base and chat models;
- Stable support of 32K context length for models of all sizes
- No need of
trust_remote_code
.
For more details, please refer to the blog post and GitHub repo.
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.5e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- total_eval_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- num_epochs: 5.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.1252 | 0.2335 | 200 | 1.0653 |
1.0075 | 0.4670 | 400 | 0.9458 |
1.2782 | 0.7005 | 600 | 0.9099 |
0.8558 | 0.9340 | 800 | 0.8929 |
0.922 | 1.1675 | 1000 | 0.8817 |
0.8985 | 1.4011 | 1200 | 0.8758 |
0.8273 | 1.6346 | 1400 | 0.8700 |
0.9136 | 1.8681 | 1600 | 0.8655 |
0.9963 | 2.1016 | 1800 | 0.8614 |
1.0214 | 2.3351 | 2000 | 0.8597 |
0.8823 | 2.5686 | 2200 | 0.8569 |
0.9265 | 2.8021 | 2400 | 0.8557 |
0.8033 | 3.0356 | 2600 | 0.8541 |
0.992 | 3.2691 | 2800 | 0.8527 |
0.7903 | 3.5026 | 3000 | 0.8522 |
0.8686 | 3.7361 | 3200 | 0.8518 |
0.954 | 3.9696 | 3400 | 0.8515 |
0.6472 | 4.2032 | 3600 | 0.8513 |
0.8799 | 4.4367 | 3800 | 0.8510 |
0.9454 | 4.6702 | 4000 | 0.8510 |
0.9496 | 4.9037 | 4200 | 0.8510 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.19.1
- Downloads last month
- 218
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.