Update README.md
#2
by
ivas-tri
- opened
README.md
CHANGED
@@ -75,8 +75,7 @@ model-index:
|
|
75 |
---
|
76 |
|
77 |
# Mamba-7B
|
78 |
-
(
|
79 |
-
This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on 1.2T tokens of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
|
80 |
Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
|
81 |
We follow their training recipe and release our version of Mamba-7B.
|
82 |
|
|
|
75 |
---
|
76 |
|
77 |
# Mamba-7B
|
78 |
+
This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on multiple epochs (1.2T tokens) of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
|
|
|
79 |
Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
|
80 |
We follow their training recipe and release our version of Mamba-7B.
|
81 |
|