pszemraj commited on
Commit
4fadd33
1 Parent(s): 73c1c8a

End of training

Browse files
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-v0.3
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: Mistral-7B-v0.3-sarcasm-scrolls-2
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.4.1`
19
+ ```yaml
20
+ base_model: mistralai/Mistral-7B-v0.3
21
+ model_type: MistralForCausalLM
22
+ tokenizer_type: LlamaTokenizer
23
+
24
+ strict: false
25
+
26
+ # dataset
27
+ datasets:
28
+ - path: BEE-spoke-data/sarcasm-scrolls
29
+ type: completion # format from earlier
30
+ field: text # Optional[str] default: text, field to use for completion data
31
+ val_set_size: 0.025
32
+
33
+ sequence_len: 4096
34
+ sample_packing: true
35
+ pad_to_sequence_len: true
36
+ train_on_inputs: false
37
+ group_by_length: false
38
+
39
+ # WANDB
40
+ wandb_project: sarcasm-scrolls
41
+ wandb_entity: pszemraj
42
+ wandb_watch: gradients
43
+ wandb_name: Mistral-7B-v0.3-sarcasm-scrolls
44
+ hub_model_id: pszemraj/Mistral-7B-v0.3-sarcasm-scrolls-2
45
+ hub_strategy: every_save
46
+
47
+ gradient_accumulation_steps: 16
48
+ micro_batch_size: 1
49
+ num_epochs: 2
50
+ optimizer: adamw_torch_fused # paged_adamw_32bit
51
+ lr_scheduler: cosine
52
+ learning_rate: 2e-5
53
+
54
+ load_in_8bit: false
55
+ load_in_4bit: false
56
+ bf16: auto
57
+ fp16:
58
+ tf32: true
59
+
60
+ torch_compile: true # requires >= torch 2.0, may sometimes cause problems
61
+ torch_compile_backend: inductor # Optional[str]
62
+ gradient_checkpointing: true
63
+ gradient_checkpointing_kwargs:
64
+ use_reentrant: false
65
+ early_stopping_patience:
66
+ logging_steps: 5
67
+ xformers_attention:
68
+ flash_attention: true
69
+
70
+ warmup_steps: 20
71
+ # hyperparams for freq of evals, saving, etc
72
+ evals_per_epoch: 4
73
+ saves_per_epoch: 4
74
+ save_safetensors: true
75
+ save_total_limit: 1 # Checkpoints saved at a time
76
+ output_dir: ./output-axolotl/output-model-theta
77
+ resume_from_checkpoint:
78
+
79
+
80
+ deepspeed:
81
+ weight_decay: 0.04
82
+
83
+ special_tokens:
84
+
85
+ ```
86
+
87
+ </details><br>
88
+
89
+ # Mistral-7B-v0.3-sarcasm-scrolls-2
90
+
91
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on the None dataset.
92
+ It achieves the following results on the evaluation set:
93
+ - Loss: 2.2825
94
+
95
+ ## Model description
96
+
97
+ More information needed
98
+
99
+ ## Intended uses & limitations
100
+
101
+ More information needed
102
+
103
+ ## Training and evaluation data
104
+
105
+ More information needed
106
+
107
+ ## Training procedure
108
+
109
+ ### Training hyperparameters
110
+
111
+ The following hyperparameters were used during training:
112
+ - learning_rate: 2e-05
113
+ - train_batch_size: 1
114
+ - eval_batch_size: 1
115
+ - seed: 42
116
+ - gradient_accumulation_steps: 16
117
+ - total_train_batch_size: 16
118
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
119
+ - lr_scheduler_type: cosine
120
+ - lr_scheduler_warmup_steps: 20
121
+ - num_epochs: 2
122
+
123
+ ### Training results
124
+
125
+ | Training Loss | Epoch | Step | Validation Loss |
126
+ |:-------------:|:------:|:----:|:---------------:|
127
+ | No log | 0.0082 | 1 | 2.3959 |
128
+ | 2.412 | 0.2544 | 31 | 2.3363 |
129
+ | 2.3866 | 0.5087 | 62 | 2.3277 |
130
+ | 2.3204 | 0.7631 | 93 | 2.3012 |
131
+ | 2.2843 | 1.0174 | 124 | 2.2682 |
132
+ | 2.1748 | 1.2718 | 155 | 2.2425 |
133
+ | 1.6885 | 1.2349 | 186 | 2.2849 |
134
+ | 1.6834 | 1.4892 | 217 | 2.2825 |
135
+
136
+
137
+ ### Framework versions
138
+
139
+ - Transformers 4.41.1
140
+ - Pytorch 2.3.0+cu118
141
+ - Datasets 2.19.1
142
+ - Tokenizers 0.19.1
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2d7de8eb9bd554cccc999649fd5a3a28a44d09042879d6c1cedd13f59c86f7a8
3
  size 4949453792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:460e24cce81f866368e722ad079e57f7d31e91b88b3fc92386c4bc31b0b87b8f
3
  size 4949453792
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:79a5f871a0fbb7919f263d78caff0d4d3b7a1d444552db3011ada70f48fd5ee4
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d78f64d217d2d96362805425d8a03f2e38293e661904c6a7177030169d5b19e
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ba7ae59358f726f5f5bc5264fd8a3aec907f42e6009d7b22454007d18723616a
3
  size 4546807800
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f8eee2c3fbf385373fcb1ab32ba354f9628afa08ffb0d366ed11dde71eee5bf
3
  size 4546807800