sedrickkeh ivas-tri commited on
Commit
accd6ba
1 Parent(s): 443ad2e

Update README.md (#2)

Browse files

- Update README.md (44279721e2370a594bfb4ec576baef6d95dfdc82)


Co-authored-by: Igor Vasiljevic <ivas-tri@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -75,8 +75,7 @@ model-index:
75
  ---
76
 
77
  # Mamba-7B
78
- (insert cool midjourney pic here?)<br>
79
- This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on 1.2T tokens of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
80
  Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
81
  We follow their training recipe and release our version of Mamba-7B.
82
 
 
75
  ---
76
 
77
  # Mamba-7B
78
+ This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on multiple epochs (1.2T tokens) of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
 
79
  Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
80
  We follow their training recipe and release our version of Mamba-7B.
81