File size: 1,109 Bytes
0c3f464 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
This is a double fine-tuned version of Mistral Small 24B Base 2501.
Stage 1 was shoving 30M tokens of human-writen story content into it using completion training ([ToastyPigeon/ms3-base-roselily](https://huggingface.co/ToastyPigeon/ms3-base-roselily)), which is about half of my WIP Roselily dataset (~60M tokens total).
Stage 2 was teaching it instruct (this model).
This model should accept (in theory) any of the following instruct formats:
**Tekken v7**
```
[SYSTEM_PROMPT]{system prompt}[/SYSTEM_PROMPT][INST]{user message}[/INST]{assistant response}</s>
```
**ChatML**
```
<|im_start|>system
{system prompt}<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{assistant response}<|im_end|>
```
**Fizzpaca**
```
### System:
{system prompt}
### Instruction:
{user message}
### Response:
{assistant response}</s>
```
The Tekken tokens were already in the tokenizer. unused special tokens #20 and 21 were repurposed for the ChatML tokens. Fizzpaca did not add any.
You may need to add both `</s>` and `<|im_end|>` as stop tokens for it to work properly with all formats. |