0502

This model is a fine-tuned version of /datas/huggingface/Qwen1.5-7B on the alpaca_formatted_ift_eft_dft_rft_2048 dataset. It achieves the following results on the evaluation set:

Loss: 0.8510

Model description

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:

8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
Significant performance improvement in Chat models;
Multilingual support of both base and chat models;
Stable support of 32K context length for models of all sizes
No need of trust_remote_code.

For more details, please refer to the blog post and GitHub repo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5.5e-06
train_batch_size: 2
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 8
total_eval_batch_size: 2
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 5.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.1252	0.2335	200	1.0653
1.0075	0.4670	400	0.9458
1.2782	0.7005	600	0.9099
0.8558	0.9340	800	0.8929
0.922	1.1675	1000	0.8817
0.8985	1.4011	1200	0.8758
0.8273	1.6346	1400	0.8700
0.9136	1.8681	1600	0.8655
0.9963	2.1016	1800	0.8614
1.0214	2.3351	2000	0.8597
0.8823	2.5686	2200	0.8569
0.9265	2.8021	2400	0.8557
0.8033	3.0356	2600	0.8541
0.992	3.2691	2800	0.8527
0.7903	3.5026	3000	0.8522
0.8686	3.7361	3200	0.8518
0.954	3.9696	3400	0.8515
0.6472	4.2032	3600	0.8513
0.8799	4.4367	3800	0.8510
0.9454	4.6702	4000	0.8510
0.9496	4.9037	4200	0.8510

Framework versions

PEFT 0.10.0
Transformers 4.40.0
Pytorch 2.1.0+cu121
Datasets 2.14.5
Tokenizers 0.19.1