pszemraj commited on
Commit
c2efc63
·
verified ·
1 Parent(s): 6e546f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -14,6 +14,7 @@ datasets:
14
  - BEE-spoke-data/fineweb-1M_longish
15
  language:
16
  - en
 
17
  ---
18
 
19
  # jamba-900M-v0.13-KIx2
@@ -22,14 +23,18 @@ language:
22
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
23
  </a>
24
 
25
- This is a pretraining experiment on the `jamba` arch as a "smol MoE". Details:
 
 
 
 
26
 
27
  - pretrained at context length 16384
28
  - seen approx 20b tokens
29
  - uses Claude3 tokenizer (as hf GPT2 tokenizer)
30
  - hidden size 1024, 12 layers, 8 experts
31
 
32
- most recent dataset, achieves the following results on the evaluation set:
33
  - Loss: 3.0366
34
  - Accuracy: 0.4514
35
  - Num Input Tokens Seen: 1975517184
 
14
  - BEE-spoke-data/fineweb-1M_longish
15
  language:
16
  - en
17
+ inference: false
18
  ---
19
 
20
  # jamba-900M-v0.13-KIx2
 
23
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
24
  </a>
25
 
26
+ > The API widget is off as it isn't supported by hf yet - try the Colab
27
+
28
+ This is a pretraining experiment on the `jamba` arch as a "smol MoE".
29
+
30
+ Details:
31
 
32
  - pretrained at context length 16384
33
  - seen approx 20b tokens
34
  - uses Claude3 tokenizer (as hf GPT2 tokenizer)
35
  - hidden size 1024, 12 layers, 8 experts
36
 
37
+ achieves the following results on the evaluation set (_ of the latest dataset_):
38
  - Loss: 3.0366
39
  - Accuracy: 0.4514
40
  - Num Input Tokens Seen: 1975517184