pszemraj commited on
Commit
da818b5
1 Parent(s): 3121335

End of training

Browse files
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: pszemraj/Mistral-7B-v0.3-prune6
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: Mistral-v0.3-6B-ii
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.4.0`
19
+ ```yaml
20
+ base_model: pszemraj/Mistral-7B-v0.3-prune6
21
+ model_type: MistralForCausalLM
22
+ tokenizer_type: LlamaTokenizer
23
+
24
+ strict: false
25
+ seed: 80085
26
+ max_steps: 2000
27
+ # dataset
28
+ datasets:
29
+ - path: BEE-spoke-data/knowledge-inoc-concat-v1
30
+ name: smorgasbord-tb-quality
31
+ type: completion
32
+ field: text
33
+ val_set_size: 0.01
34
+
35
+ sequence_len: 4096
36
+ sample_packing: true
37
+ pad_to_sequence_len: false
38
+ train_on_inputs: false
39
+ group_by_length: false
40
+
41
+ # WANDB
42
+ wandb_project: llama3-pruning
43
+ wandb_entity: pszemraj
44
+ wandb_watch: gradients
45
+ wandb_name: Mistral-6B-v0.3-v0.1-ii
46
+ hub_model_id: pszemraj/Mistral-v0.3-6B-ii
47
+ hub_strategy: every_save
48
+
49
+ gradient_accumulation_steps: 16
50
+ micro_batch_size: 1
51
+ num_epochs: 1
52
+ optimizer: paged_adamw_32bit
53
+ weight_decay: 0.1
54
+ lr_scheduler: cosine
55
+ learning_rate: 2e-5
56
+ warmup_ratio: 0.1
57
+
58
+ load_in_8bit: false
59
+ load_in_4bit: false
60
+ bfloat16: true
61
+ tf32: true
62
+
63
+ flash_attention: true
64
+ torch_compile: true
65
+ torch_compile_backend: inductor
66
+ gradient_checkpointing: true
67
+ gradient_checkpointing_kwargs:
68
+ use_reentrant: false
69
+
70
+ # hyperparams for freq of evals, saving, etc
71
+ evals_per_epoch: 5
72
+ saves_per_epoch: 5
73
+ save_safetensors: true
74
+ save_total_limit: 1
75
+ output_dir: /workspace/output-axolotl/output-model-6b
76
+ logging_steps: 6
77
+
78
+ deepspeed:
79
+
80
+ special_tokens:
81
+
82
+ ```
83
+
84
+ </details><br>
85
+
86
+ # Mistral-v0.3-6B-ii
87
+
88
+ This model is a fine-tuned version of [pszemraj/Mistral-7B-v0.3-prune6](https://huggingface.co/pszemraj/Mistral-7B-v0.3-prune6) on the None dataset.
89
+ It achieves the following results on the evaluation set:
90
+ - Loss: 1.2860
91
+
92
+ ## Model description
93
+
94
+ More information needed
95
+
96
+ ## Intended uses & limitations
97
+
98
+ More information needed
99
+
100
+ ## Training and evaluation data
101
+
102
+ More information needed
103
+
104
+ ## Training procedure
105
+
106
+ ### Training hyperparameters
107
+
108
+ The following hyperparameters were used during training:
109
+ - learning_rate: 2e-05
110
+ - train_batch_size: 1
111
+ - eval_batch_size: 1
112
+ - seed: 80085
113
+ - gradient_accumulation_steps: 16
114
+ - total_train_batch_size: 16
115
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
116
+ - lr_scheduler_type: cosine
117
+ - lr_scheduler_warmup_steps: 200
118
+ - training_steps: 2000
119
+
120
+ ### Training results
121
+
122
+ | Training Loss | Epoch | Step | Validation Loss |
123
+ |:-------------:|:------:|:----:|:---------------:|
124
+ | No log | 0.0002 | 1 | 1.5980 |
125
+ | 1.578 | 0.0955 | 400 | 1.4028 |
126
+ | 1.5828 | 0.1911 | 800 | 1.3809 |
127
+ | 1.4355 | 0.2866 | 1200 | 1.3152 |
128
+ | 1.4618 | 0.3822 | 1600 | 1.2877 |
129
+ | 1.4551 | 0.4777 | 2000 | 1.2860 |
130
+
131
+
132
+ ### Framework versions
133
+
134
+ - Transformers 4.40.2
135
+ - Pytorch 2.3.0+cu118
136
+ - Datasets 2.19.1
137
+ - Tokenizers 0.19.1
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:46c5e3e1c7f14c651b8954733db0fe00d1853e8a5bd0eccae1b0073ec7067bf4
3
  size 4949453792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6387739d7045c7530789f372298a8a81decd83c89d78b6e85d2ce23a924509c
3
  size 4949453792
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:259fe40447303fc97bb4d606b03ee2f84df297340acc35ec9d0de47df6cb070a
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3efbae4d9a6e86249c8121a5f6e89c79e111f0b3079a56e539899847f679644c
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b698cf45add049b21c866559ed166c0aff5ef78d4582a2cc8b37d4242b239cde
3
  size 1929457496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00098a24611641eccceba7b3ed58032ff1f2cf486050f0fc3716fb0cd4fbd3fc
3
  size 1929457496