lucyknada commited on
Commit
6678f59
·
verified ·
1 Parent(s): e04f522

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -47
README.md CHANGED
@@ -1,20 +1,90 @@
1
  ---
2
  license: gemma
3
  base_model: IntervitensInc/gemma-2-9b-chatml
4
- tags:
5
- - generated_from_trainer
6
  model-index:
7
  - name: magnum-v3-9b-chatml
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
15
  <details><summary>See axolotl config</summary>
16
 
17
- axolotl version: `0.4.1`
18
  ```yaml
19
  base_model: IntervitensInc/gemma-2-9b-chatml
20
  model_type: AutoModelForCausalLM
@@ -101,52 +171,25 @@ weight_decay: 0.05
101
  fsdp:
102
  fsdp_config:
103
  special_tokens:
104
- ```
105
 
 
106
  </details><br>
107
 
108
- # magnum-v3-9b-chatml
109
-
110
- This model is a fine-tuned version of [IntervitensInc/gemma-2-9b-chatml](https://huggingface.co/IntervitensInc/gemma-2-9b-chatml) on the None dataset.
111
-
112
- ## Model description
113
-
114
- More information needed
115
-
116
- ## Intended uses & limitations
117
-
118
- More information needed
119
-
120
- ## Training and evaluation data
121
-
122
- More information needed
123
-
124
- ## Training procedure
125
-
126
- ### Training hyperparameters
127
-
128
- The following hyperparameters were used during training:
129
- - learning_rate: 6e-06
130
- - train_batch_size: 1
131
- - eval_batch_size: 1
132
- - seed: 42
133
- - distributed_type: multi-GPU
134
- - num_devices: 8
135
- - gradient_accumulation_steps: 8
136
- - total_train_batch_size: 64
137
- - total_eval_batch_size: 8
138
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
139
- - lr_scheduler_type: cosine
140
- - lr_scheduler_warmup_steps: 50
141
- - num_epochs: 2
142
 
143
- ### Training results
144
 
 
 
 
 
 
145
 
 
 
146
 
147
- ### Framework versions
148
 
149
- - Transformers 4.44.0
150
- - Pytorch 2.4.0+cu121
151
- - Datasets 2.20.0
152
- - Tokenizers 0.19.1
 
1
  ---
2
  license: gemma
3
  base_model: IntervitensInc/gemma-2-9b-chatml
 
 
4
  model-index:
5
  - name: magnum-v3-9b-chatml
6
  results: []
7
  ---
8
 
9
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a46cbfb9c2bdfae75b3a6/9ZBUlmzDCnNmQEdUUbyEL.png)
10
+
11
+ This is the 11th in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus.
12
+
13
+ This model is fine-tuned on top of [IntervitensInc/gemma-2-9b-chatml](IntervitensInc/gemma-2-9b-chatml). (chatMLified gemma-2-9b)
14
+
15
+ ## Prompting
16
+ Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:
17
+
18
+ ```py
19
+ """<|im_start|>system
20
+ system prompt<|im_end|>
21
+ <|im_start|>user
22
+ Hi there!<|im_end|>
23
+ <|im_start|>assistant
24
+ Nice to meet you!<|im_end|>
25
+ <|im_start|>user
26
+ Can I ask a question?<|im_end|>
27
+ <|im_start|>assistant
28
+ """
29
+ ```
30
+
31
+ ## SillyTavern templates
32
+
33
+ Below are Instruct and Context templates for use within SillyTavern.
34
+
35
+ <details><summary>context template</summary>
36
+
37
+ ```yaml
38
+ {
39
+ "story_string": "<|im_start|>system\n{{#if system}}{{system}}\n{{/if}}{{#if wiBefore}}{{wiBefore}}\n{{/if}}{{#if description}}{{description}}\n{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}\n{{/if}}{{#if scenario}}Scenario: {{scenario}}\n{{/if}}{{#if wiAfter}}{{wiAfter}}\n{{/if}}{{#if persona}}{{persona}}\n{{/if}}{{trim}}<|im_end|>\n",
40
+ "example_separator": "",
41
+ "chat_start": "",
42
+ "use_stop_strings": false,
43
+ "allow_jailbreak": false,
44
+ "always_force_name2": true,
45
+ "trim_sentences": false,
46
+ "include_newline": false,
47
+ "single_line": false,
48
+ "name": "Magnum ChatML"
49
+ }
50
+ ```
51
+
52
+ </details><br>
53
+ <details><summary>instruct template</summary>
54
+
55
+ ```yaml
56
+ {
57
+ "system_prompt": "You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}.",
58
+ "input_sequence": "<|im_start|>user\n",
59
+ "output_sequence": "<|im_start|>assistant\n",
60
+ "last_output_sequence": "",
61
+ "system_sequence": "<|im_start|>system\n",
62
+ "stop_sequence": "<|im_end|>",
63
+ "wrap": false,
64
+ "macro": true,
65
+ "names": true,
66
+ "names_force_groups": true,
67
+ "activation_regex": "",
68
+ "system_sequence_prefix": "",
69
+ "system_sequence_suffix": "",
70
+ "first_output_sequence": "",
71
+ "skip_examples": false,
72
+ "output_suffix": "<|im_end|>\n",
73
+ "input_suffix": "<|im_end|>\n",
74
+ "system_suffix": "<|im_end|>\n",
75
+ "user_alignment_message": "",
76
+ "system_same_as_user": false,
77
+ "last_system_sequence": "",
78
+ "name": "Magnum ChatML"
79
+ }
80
+ ```
81
+
82
+ </details><br>
83
+
84
+ ## Axolotl config
85
 
 
86
  <details><summary>See axolotl config</summary>
87
 
 
88
  ```yaml
89
  base_model: IntervitensInc/gemma-2-9b-chatml
90
  model_type: AutoModelForCausalLM
 
171
  fsdp:
172
  fsdp_config:
173
  special_tokens:
 
174
 
175
+ ```
176
  </details><br>
177
 
178
+ ## Credits
179
+ We'd like to thank Recursal / Featherless for sponsoring the compute for this train, Featherless has been hosting our Magnum models since the first 72 B and has given thousands of people access to our models and helped us grow.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
 
181
+ We would also like to thank all members of Anthracite who made this finetune possible.
182
 
183
+ - [anthracite-org/stheno-filtered-v1.1](https://huggingface.co/datasets/anthracite-org/stheno-filtered-v1.1)
184
+ - [anthracite-org/kalo-opus-instruct-22k-no-refusal](https://huggingface.co/datasets/anthracite-org/kalo-opus-instruct-22k-no-refusal)
185
+ - [anthracite-org/nopm_claude_writing_fixed](https://huggingface.co/datasets/anthracite-org/nopm_claude_writing_fixed)
186
+ - [Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned)
187
+ - [Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned)
188
 
189
+ ## Training
190
+ The training was done for 2 epochs. We used 8x[H100s](https://www.nvidia.com/en-us/data-center/h100/) GPUs graciously provided by [Recursal AI](https://recursal.ai/) / [Featherless AI](https://featherless.ai/) for the full-parameter fine-tuning of the model.
191
 
192
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
193
 
194
+ ## Safety
195
+ ...