lukeleeai commited on
Commit
79233a0
1 Parent(s): 333109a

End of training

Browse files
README.md CHANGED
@@ -4,18 +4,18 @@ base_model: mistralai/Mistral-7B-v0.1
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
- - name: Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-13
8
  results: []
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
- # Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-13
15
 
16
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 9.8327
19
 
20
  ## Model description
21
 
@@ -39,10 +39,10 @@ The following hyperparameters were used during training:
39
  - eval_batch_size: 1
40
  - seed: 0
41
  - distributed_type: multi-GPU
42
- - num_devices: 2
43
  - gradient_accumulation_steps: 8
44
- - total_train_batch_size: 16
45
- - total_eval_batch_size: 2
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: linear
48
  - training_steps: 200
@@ -51,14 +51,14 @@ The following hyperparameters were used during training:
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
- | 3.7951 | 0.0 | 25 | 2.4027 |
55
- | 3.642 | 0.01 | 50 | 2.3900 |
56
- | 3.6958 | 0.01 | 75 | 2.3846 |
57
- | 3.5839 | 0.02 | 100 | 2.3938 |
58
- | 3.5473 | 0.02 | 125 | 2.4562 |
59
- | 3.5564 | 0.02 | 150 | 2.5087 |
60
- | 3.4657 | 0.03 | 175 | 2.5109 |
61
- | 3.4677 | 0.03 | 200 | 2.5261 |
62
 
63
 
64
  ### Framework versions
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
+ - name: Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-14
8
  results: []
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-14
15
 
16
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 3.2117
19
 
20
  ## Model description
21
 
 
39
  - eval_batch_size: 1
40
  - seed: 0
41
  - distributed_type: multi-GPU
42
+ - num_devices: 4
43
  - gradient_accumulation_steps: 8
44
+ - total_train_batch_size: 32
45
+ - total_eval_batch_size: 4
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: linear
48
  - training_steps: 200
 
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
+ | 3.6973 | 0.01 | 25 | 2.3992 |
55
+ | 3.6504 | 0.02 | 50 | 2.3855 |
56
+ | 3.6737 | 0.02 | 75 | 2.3872 |
57
+ | 3.5868 | 0.03 | 100 | 2.4532 |
58
+ | 3.5604 | 0.04 | 125 | 2.4999 |
59
+ | 3.4312 | 0.05 | 150 | 2.5201 |
60
+ | 3.3355 | 0.06 | 175 | 2.5216 |
61
+ | 3.3825 | 0.06 | 200 | 2.5236 |
62
 
63
 
64
  ### Framework versions
config.json CHANGED
@@ -23,38 +23,38 @@
23
  "rope_theta": 10000.0,
24
  "sliding_window": 4096,
25
  "thresholds": [
26
- 0.0,
27
- 0.0,
28
- 0.0,
29
- 0.0,
30
- 0.0,
31
- 0.0,
32
- 0.0,
33
- 0.0,
34
- 0.0,
35
- 0.0,
36
- 0.0,
37
- 0.0,
38
- 0.0,
39
- 0.0,
40
- 0.0,
41
- 0.0,
42
- 0.0,
43
- 0.0,
44
- 0.0,
45
- 0.0,
46
- 0.0,
47
- 0.0,
48
- 0.0,
49
- 0.0,
50
- 0.0,
51
- 0.0,
52
- 0.0,
53
- 0.0,
54
- 0.0,
55
- 0.0,
56
- 0.0,
57
- 0.0
58
  ],
59
  "tie_word_embeddings": false,
60
  "torch_dtype": "bfloat16",
 
23
  "rope_theta": 10000.0,
24
  "sliding_window": 4096,
25
  "thresholds": [
26
+ 0.0631895586848259,
27
+ 0.07923770695924759,
28
+ 0.089267797768116,
29
+ 0.10732196271419525,
30
+ 0.12738214433193207,
31
+ 0.1414242684841156,
32
+ 0.15346036851406097,
33
+ 0.16349045932292938,
34
+ 0.1675025075674057,
35
+ 0.1675025075674057,
36
+ 0.1675025075674057,
37
+ 0.1735205501317978,
38
+ 0.17552657425403595,
39
+ 0.1775325983762741,
40
+ 0.18756268918514252,
41
+ 0.1935807317495346,
42
+ 0.19759276509284973,
43
+ 0.21364091336727142,
44
+ 0.22367100417613983,
45
+ 0.23169508576393127,
46
+ 0.22367100417613983,
47
+ 0.22968906164169312,
48
+ 0.22367100417613983,
49
+ 0.22367100417613983,
50
+ 0.23169508576393127,
51
+ 0.23971915245056152,
52
+ 0.2457372099161148,
53
+ 0.2577733099460602,
54
+ 0.2678034007549286,
55
+ 0.27382147312164307,
56
+ 0.27582746744155884,
57
+ 0.277833491563797
58
  ],
59
  "tie_word_embeddings": false,
60
  "torch_dtype": "bfloat16",
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d02bd152af94684e1f5985811068428480e16588820437e00141f69d460ee7a
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:824e1239d9fae01a0d92076a96352399912e41d2ab3770881a2165c5004dae3e
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b1ee2301c7351b6c17a5ad1bc6e051bee91d07d6c423f11a0b0df809dba126f
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a14f497d6e09056c31f8d91a735b9025435eab785f3842a46e2377147881a34
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:15717942cb0df08ca91d09d06dbeb1ad2295bc2bc6f5d9f6da0a6e67714fef5e
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7814c4a55aa339eeed03a589db54cc81730b2e3484eebd1155b908483efea73
3
  size 4540516344
sparsification_sftt.py CHANGED
@@ -585,7 +585,7 @@ class GracefulRegularizationScheduler(TrainerCallback):
585
  enable_sparse_silu(base_model)
586
  self.trainer.evaluate()
587
  save_act_hist(base_model, self.act_hist_path)
588
- set_sparse_threshold(base_model, self.targeted_sparsity, True)
589
  deactivate_stats(base_model)
590
  self.trainer.use_sparse_regularization = self.keep_regularization_with_kill
591
  # set_layer_specific_regularization(model.get_base_model())
 
585
  enable_sparse_silu(base_model)
586
  self.trainer.evaluate()
587
  save_act_hist(base_model, self.act_hist_path)
588
+ set_sparse_threshold(base_model, self.targeted_sparsity, False)
589
  deactivate_stats(base_model)
590
  self.trainer.use_sparse_regularization = self.keep_regularization_with_kill
591
  # set_layer_specific_regularization(model.get_base_model())