Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,7 @@ datasets:
|
|
14 |
- BEE-spoke-data/fineweb-1M_longish
|
15 |
language:
|
16 |
- en
|
|
|
17 |
---
|
18 |
|
19 |
# jamba-900M-v0.13-KIx2
|
@@ -22,14 +23,18 @@ language:
|
|
22 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
23 |
</a>
|
24 |
|
25 |
-
|
|
|
|
|
|
|
|
|
26 |
|
27 |
- pretrained at context length 16384
|
28 |
- seen approx 20b tokens
|
29 |
- uses Claude3 tokenizer (as hf GPT2 tokenizer)
|
30 |
- hidden size 1024, 12 layers, 8 experts
|
31 |
|
32 |
-
|
33 |
- Loss: 3.0366
|
34 |
- Accuracy: 0.4514
|
35 |
- Num Input Tokens Seen: 1975517184
|
|
|
14 |
- BEE-spoke-data/fineweb-1M_longish
|
15 |
language:
|
16 |
- en
|
17 |
+
inference: false
|
18 |
---
|
19 |
|
20 |
# jamba-900M-v0.13-KIx2
|
|
|
23 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
24 |
</a>
|
25 |
|
26 |
+
> The API widget is off as it isn't supported by hf yet - try the Colab
|
27 |
+
|
28 |
+
This is a pretraining experiment on the `jamba` arch as a "smol MoE".
|
29 |
+
|
30 |
+
Details:
|
31 |
|
32 |
- pretrained at context length 16384
|
33 |
- seen approx 20b tokens
|
34 |
- uses Claude3 tokenizer (as hf GPT2 tokenizer)
|
35 |
- hidden size 1024, 12 layers, 8 experts
|
36 |
|
37 |
+
achieves the following results on the evaluation set (_ of the latest dataset_):
|
38 |
- Loss: 3.0366
|
39 |
- Accuracy: 0.4514
|
40 |
- Num Input Tokens Seen: 1975517184
|