metadata

license: mit

0506_7_7

This model is a fine-tuned version of ../../models/Qwen1.5-7B-sft-0502 on the alpaca_formatted_review_new_data_0505_greater_7 dataset. It achieves the following results on the evaluation set:

Loss: 0.7221

Model description

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:

8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
Significant performance improvement in Chat models;
Multilingual support of both base and chat models;
Stable support of 32K context length for models of all sizes
No need of trust_remote_code.

For more details, please refer to the blog post and GitHub repo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 2
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 2
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 20
num_epochs: 5.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.7981	0.2768	20	0.6501
0.7391	0.5536	40	0.6358
0.744	0.8304	60	0.6277
0.6284	1.1073	80	0.6241
0.7339	1.3841	100	0.6303
0.8346	1.6609	120	0.6408
0.6927	1.9377	140	0.6391
0.4915	2.2145	160	0.6543
0.7845	2.4913	180	0.6596
0.6619	2.7682	200	0.6587
0.4897	3.0450	220	0.6679
0.5064	3.3218	240	0.6951
0.6467	3.5986	260	0.6997
0.6615	3.8754	280	0.6985
0.4954	4.1522	300	0.7111
0.5624	4.4291	320	0.7216
0.5554	4.7059	340	0.7218
0.6798	4.9827	360	0.7221

Framework versions

PEFT 0.10.0
Transformers 4.40.0
Pytorch 2.1.0+cu121
Datasets 2.14.5
Tokenizers 0.19.1