TRI-ML
/

mamba-7b-rw

Text Generation

Model card Files Files and versions Community

Update README.md

#2

by ivas-tri - opened Apr 16

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -75,8 +75,7 @@ model-index:
 ---
 # Mamba-7B
-(insert cool midjourney pic here?)<br>
-This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on 1.2T tokens of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
 Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
 We follow their training recipe and release our version of Mamba-7B.

 ---
 # Mamba-7B
+This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on multiple epochs (1.2T tokens) of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
 Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
 We follow their training recipe and release our version of Mamba-7B.