RikkiXu commited on
Commit
416c563
1 Parent(s): 6202d51

Model save

Browse files
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  tags:
3
  - trl
4
  - sft
@@ -15,9 +17,9 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # zephyr-7b-sft-full
17
 
18
- This model was trained from scratch on the generator dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.3747
21
 
22
  ## Model description
23
 
@@ -36,14 +38,14 @@ More information needed
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
- - learning_rate: 5e-06
40
- - train_batch_size: 8
41
- - eval_batch_size: 4
42
  - seed: 42
43
  - distributed_type: multi-GPU
44
  - num_devices: 8
45
- - total_train_batch_size: 64
46
- - total_eval_batch_size: 32
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: cosine
49
  - lr_scheduler_warmup_ratio: 0.1
@@ -53,12 +55,12 @@ The following hyperparameters were used during training:
53
 
54
  | Training Loss | Epoch | Step | Validation Loss |
55
  |:-------------:|:-----:|:----:|:---------------:|
56
- | 0.8468 | 1.0 | 212 | 1.3747 |
57
 
58
 
59
  ### Framework versions
60
 
61
- - Transformers 4.38.2
62
  - Pytorch 2.1.2+cu118
63
  - Datasets 2.16.1
64
  - Tokenizers 0.15.2
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-v0.1
4
  tags:
5
  - trl
6
  - sft
 
17
 
18
  # zephyr-7b-sft-full
19
 
20
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 0.9156
23
 
24
  ## Model description
25
 
 
38
  ### Training hyperparameters
39
 
40
  The following hyperparameters were used during training:
41
+ - learning_rate: 2e-05
42
+ - train_batch_size: 16
43
+ - eval_batch_size: 8
44
  - seed: 42
45
  - distributed_type: multi-GPU
46
  - num_devices: 8
47
+ - total_train_batch_size: 128
48
+ - total_eval_batch_size: 64
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.1
 
55
 
56
  | Training Loss | Epoch | Step | Validation Loss |
57
  |:-------------:|:-----:|:----:|:---------------:|
58
+ | 0.9189 | 1.0 | 1107 | 0.9156 |
59
 
60
 
61
  ### Framework versions
62
 
63
+ - Transformers 4.39.3
64
  - Pytorch 2.1.2+cu118
65
  - Datasets 2.16.1
66
  - Tokenizers 0.15.2
generation_config.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
- "eos_token_id": 32000,
5
- "transformers_version": "4.38.2"
6
  }
 
1
  {
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.39.3"
6
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bc7cb9e4badbf8993c1426656d83a5ae438c39b7ded5e918ff101906d47cdf44
3
- size 4943178720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97bc9ba5fda3820613528c7020adc08e0324d9d17eb34b2df064f9466c077ab6
3
+ size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:012a56111d83215c50a688f649ae445d3835d2ff9e1c35dde0ef9d90a163150f
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6465729604922a61a5823ced600e8f4687c65d7d905528cc934843c6815fcdc8
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:676c1a2fb57fd3e8f69cd479063cc2c9c72dade61e81cb89f912f0d42442068c
3
- size 4540532728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4f90c8a0b887397147f13592416daeda1551a36a7951d8b609a064cfe8cd8aa
3
+ size 4540516344
model.safetensors.index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "metadata": {
3
- "total_size": 14483496960
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00003-of-00003.safetensors",
 
1
  {
2
  "metadata": {
3
+ "total_size": 14483464192
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00003-of-00003.safetensors",
runs/Jun12_23-56-37_n136-129-074/events.out.tfevents.1718210521.n136-129-074.3717215.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dd40992adc7954210a53463a6b9f1515ebb3d374c7951c7b3f3bcc8940690f82
3
- size 51281
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f2aed5119ca9a8d3b598059ce7b678e97182c035a061146d7524dfba04d893c
3
+ size 52117