WDong commited on
Commit
ce1df79
1 Parent(s): 857f8aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -16,13 +16,62 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # 06051615
18
 
19
- This model is a fine-tuned version of [/datas/huggingface/Qwen1.5-7B-Chat/](https://huggingface.co//datas/huggingface/Qwen1.5-7B-Chat/) on the alpaca_formatted_ift_eft_dft_rft_share5k_2048 dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.9018
22
 
23
  ## Model description
24
 
 
 
 
 
 
 
 
 
 
 
 
25
  More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Intended uses & limitations
28
 
 
16
 
17
  # 06051615
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) on the my own dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.9018
22
 
23
  ## Model description
24
 
25
+ Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:
26
+ * 8 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B and 72B dense models, and an MoE model of 14B with 2.7B activated;
27
+ * Significant performance improvement in Chat models;
28
+ * Multilingual support of both base and chat models;
29
+ * Stable support of 32K context length for models of all sizes
30
+ * No need of `trust_remote_code`.
31
+ For more details, please refer to the [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5).
32
+
33
+ ## Intended uses & limitations
34
+ More information needed
35
+ ## Training and evaluation data
36
  More information needed
37
+ ## Training procedure
38
+ ### Training hyperparameters
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 7e-05
41
+ - train_batch_size: 2
42
+ - eval_batch_size: 1
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - num_devices: 2
46
+ - gradient_accumulation_steps: 4
47
+ - total_train_batch_size: 16
48
+ - total_eval_batch_size: 2
49
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
+ - lr_scheduler_type: cosine
51
+ - lr_scheduler_warmup_steps: 13
52
+ - num_epochs: 5.0
53
+ - mixed_precision_training: Native AMP
54
+
55
+ ### Training results
56
+
57
+ | Training Loss | Epoch | Step | Validation Loss |
58
+ | :-----------: | :----: | :--: | :-------------: |
59
+ | 0.6358 | 0.7619 | 20 | 0.5865 |
60
+ | 0.6379 | 1.5238 | 40 | 0.5621 |
61
+ | 0.6067 | 2.2857 | 60 | 0.5561 |
62
+ | 0.5339 | 3.0476 | 80 | 0.5515 |
63
+ | 0.6749 | 3.8095 | 100 | 0.5500 |
64
+ | 0.6351 | 4.5714 | 120 | 0.5497 |
65
+
66
+
67
+ ### Framework versions
68
+
69
+ - PEFT 0.10.0
70
+ - Transformers 4.40.0
71
+ - Pytorch 2.1.0+cu121
72
+ - Datasets 2.14.5
73
+ - Tokenizers 0.19.1
74
+
75
 
76
  ## Intended uses & limitations
77