pszemraj
/

jamba-900M-v0.13-KIx2

Text Generation

Model card Files Files and versions Community

pszemraj commited on May 18, 2024

Commit

c2efc63

·

verified ·

1 Parent(s): 6e546f2

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -14,6 +14,7 @@ datasets:
 - BEE-spoke-data/fineweb-1M_longish
 language:
 - en
 ---
 # jamba-900M-v0.13-KIx2
@@ -22,14 +23,18 @@ language:
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
-This is a pretraining experiment on the `jamba` arch as a "smol MoE". Details:
 - pretrained at context length 16384
 - seen approx 20b tokens
 - uses Claude3 tokenizer (as hf GPT2 tokenizer)
 - hidden size 1024, 12 layers, 8 experts
-most recent dataset, achieves the following results on the evaluation set:
 - Loss: 3.0366
 - Accuracy: 0.4514
 - Num Input Tokens Seen: 1975517184

 - BEE-spoke-data/fineweb-1M_longish
 language:
 - en
+inference: false
 ---
 # jamba-900M-v0.13-KIx2
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
+> The API widget is off as it isn't supported by hf yet - try the Colab
+This is a pretraining experiment on the `jamba` arch as a "smol MoE".
+Details:
 - pretrained at context length 16384
 - seen approx 20b tokens
 - uses Claude3 tokenizer (as hf GPT2 tokenizer)
 - hidden size 1024, 12 layers, 8 experts
+achieves the following results on the evaluation set (_ of the latest dataset_):
 - Loss: 3.0366
 - Accuracy: 0.4514
 - Num Input Tokens Seen: 1975517184