Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ this model isn't really made for benchmarks, it's worse on everything besides AR
|
|
12 |
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 59.98 | **83.31** | **64.16** | 42.15 | **78.37** | **37.83** |
|
13 |
| [crumb/92d52f-ame-full-7B](https://hf.co/crumb/92d52f-ame-full-7B) | **61.18** | 81.52 | 63.44 | **42.39** | 77.58 | 35.41 |
|
14 |
|
15 |
-
it's got extra tokens which can all equally be used as masks, you can replace all instances of one token in context with one of the extra tokens (`[f'<ID-{i:06X}>' for i in range(2048)]`) to give the model an extra hard time. it was trained with context length 2048 on three separate replacement techniques through a schedule, with 80% of all sequences being completely replaced with the mask tokens near the end of training.
|
16 |
|
17 |
> what? how is that useful?
|
18 |
|
|
|
12 |
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 59.98 | **83.31** | **64.16** | 42.15 | **78.37** | **37.83** |
|
13 |
| [crumb/92d52f-ame-full-7B](https://hf.co/crumb/92d52f-ame-full-7B) | **61.18** | 81.52 | 63.44 | **42.39** | 77.58 | 35.41 |
|
14 |
|
15 |
+
it's got extra tokens which can all equally be used as masks, you can replace all instances of one token in context with one of the extra tokens (`[f'<ID-{i:06X}>' for i in range(2048)]`) to give the model an extra hard time. it was trained with context length 2048 on three separate replacement techniques through a schedule, with 80% of all sequences being completely replaced with the mask tokens near the end of training. it was trained over ~0.5B tokens
|
16 |
|
17 |
> what? how is that useful?
|
18 |
|