Text Generation
Transformers
Safetensors
mistral
chat
conversational
text-generation-inference
Inference Endpoints
kalomaze commited on
Commit
9e06fc7
1 Parent(s): c0fbf5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -23,10 +23,13 @@ This is the sixth in a series of models designed to replicate the prose quality
23
  Model has been Instruct tuned with the Mistral formatting. A typical input would look like this:
24
 
25
  ```py
26
- """[INST] Hi there! [/INST]Nice to meet you!</s>[INST] Can I ask a question? [/INST]
27
- """
28
  ```
29
 
 
 
 
 
30
  ## Credits
31
  - Stheno dataset (filtered)
32
  - [anthracite-org/kalo-opus-instruct-22k-no-refusal](anthracite-org/kalo-opus-instruct-22k-no-refusal)
@@ -42,9 +45,8 @@ In addition to this, we noticed that Mistral Large models seemed much more sensi
42
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/xCK3ISKF6pWcMyO7MEzTA.png)
43
 
44
  We hypothesize this is primarily due to the particularly narrow and low variance weight distributions typical of Mistral derived models regardless of their scale.
45
- In the end, we settled on 2e-6 with an effective batch size of 64 (and a packed tokens batch size of 8192; effectively ~500,000 tokens per batch).
46
 
47
- We also trained with a weight decay of 0.01 to help further stabilize the loss trajectory and mitigate overfitting.
48
 
49
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
50
 
 
23
  Model has been Instruct tuned with the Mistral formatting. A typical input would look like this:
24
 
25
  ```py
26
+ <s>[INST] SYSTEM MESSAGE\nUSER MESSAGE[/INST] ASSISTANT MESSAGE</s>[INST] USER MESSAGE[/INST]
 
27
  ```
28
 
29
+ We also provide the appropriate SillyTavern presets for [Context](https://huggingface.co/anthracite-org/Magnum-123b-v1/resolve/main/Magnum-Mistral-Context.json) and [Instruct](https://huggingface.co/anthracite-org/Magnum-123b-v1/raw/main/Magnum-Mistral-Instruct.json) respectively.
30
+
31
+ The default Mistral preset included in SillyTavern seems to be misconfigured by default, so please keep these presets in mind.
32
+
33
  ## Credits
34
  - Stheno dataset (filtered)
35
  - [anthracite-org/kalo-opus-instruct-22k-no-refusal](anthracite-org/kalo-opus-instruct-22k-no-refusal)
 
45
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/xCK3ISKF6pWcMyO7MEzTA.png)
46
 
47
  We hypothesize this is primarily due to the particularly narrow and low variance weight distributions typical of Mistral derived models regardless of their scale.
 
48
 
49
+ In the end, due to the costs that would be involved in training another full 2 epochs run ($600), we settled on our third attempt: 2e-6 with an effective batch size of 64, stopped earlier than the target 2 epochs.
50
 
51
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
52