Update README.md
Browse files
README.md
CHANGED
@@ -8,9 +8,9 @@ language:
|
|
8 |
|
9 |
This model is pretrained Based model.
|
10 |
|
11 |
-
As a quality reference, we include a pretrained Mamba model provided here: https://huggingface.co/hazyresearch/mamba-1b-50b
|
12 |
|
13 |
-
|
14 |
|
15 |
A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
|
16 |
|
|
|
8 |
|
9 |
This model is pretrained Based model.
|
10 |
|
11 |
+
As a quality reference, we include a pretrained Mamba model provided here: https://huggingface.co/hazyresearch/mamba-1b-50b and a pretrained attention (Llama architecture) model provided here: https://huggingface.co/hazyresearch/attn-1b-50bn
|
12 |
|
13 |
+
All three checkpoints are pretrained on **50Bn tokens** of the Pile in the exact same data order using next token prediction.
|
14 |
|
15 |
A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
|
16 |
|