Update README.md
Browse files
README.md
CHANGED
@@ -23,10 +23,13 @@ This is the sixth in a series of models designed to replicate the prose quality
|
|
23 |
Model has been Instruct tuned with the Mistral formatting. A typical input would look like this:
|
24 |
|
25 |
```py
|
26 |
-
|
27 |
-
"""
|
28 |
```
|
29 |
|
|
|
|
|
|
|
|
|
30 |
## Credits
|
31 |
- Stheno dataset (filtered)
|
32 |
- [anthracite-org/kalo-opus-instruct-22k-no-refusal](anthracite-org/kalo-opus-instruct-22k-no-refusal)
|
@@ -42,9 +45,8 @@ In addition to this, we noticed that Mistral Large models seemed much more sensi
|
|
42 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/xCK3ISKF6pWcMyO7MEzTA.png)
|
43 |
|
44 |
We hypothesize this is primarily due to the particularly narrow and low variance weight distributions typical of Mistral derived models regardless of their scale.
|
45 |
-
In the end, we settled on 2e-6 with an effective batch size of 64 (and a packed tokens batch size of 8192; effectively ~500,000 tokens per batch).
|
46 |
|
47 |
-
|
48 |
|
49 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
50 |
|
|
|
23 |
Model has been Instruct tuned with the Mistral formatting. A typical input would look like this:
|
24 |
|
25 |
```py
|
26 |
+
<s>[INST] SYSTEM MESSAGE\nUSER MESSAGE[/INST] ASSISTANT MESSAGE</s>[INST] USER MESSAGE[/INST]
|
|
|
27 |
```
|
28 |
|
29 |
+
We also provide the appropriate SillyTavern presets for [Context](https://huggingface.co/anthracite-org/Magnum-123b-v1/resolve/main/Magnum-Mistral-Context.json) and [Instruct](https://huggingface.co/anthracite-org/Magnum-123b-v1/raw/main/Magnum-Mistral-Instruct.json) respectively.
|
30 |
+
|
31 |
+
The default Mistral preset included in SillyTavern seems to be misconfigured by default, so please keep these presets in mind.
|
32 |
+
|
33 |
## Credits
|
34 |
- Stheno dataset (filtered)
|
35 |
- [anthracite-org/kalo-opus-instruct-22k-no-refusal](anthracite-org/kalo-opus-instruct-22k-no-refusal)
|
|
|
45 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/xCK3ISKF6pWcMyO7MEzTA.png)
|
46 |
|
47 |
We hypothesize this is primarily due to the particularly narrow and low variance weight distributions typical of Mistral derived models regardless of their scale.
|
|
|
48 |
|
49 |
+
In the end, due to the costs that would be involved in training another full 2 epochs run ($600), we settled on our third attempt: 2e-6 with an effective batch size of 64, stopped earlier than the target 2 epochs.
|
50 |
|
51 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
52 |
|