Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,8 @@ license: apache-2.0
|
|
5 |
|
6 |
Zamba-7B-v1-phase1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1-phase-1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Unlike Zamba-v1, this model represents the checkpoint after pure prertaining only on web-datasets. We envision its use primarily as a comparison tool to explore the effects of our annealing process.
|
7 |
|
|
|
|
|
8 |
## Quick start
|
9 |
|
10 |
### Presequities
|
|
|
5 |
|
6 |
Zamba-7B-v1-phase1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1-phase-1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Unlike Zamba-v1, this model represents the checkpoint after pure prertaining only on web-datasets. We envision its use primarily as a comparison tool to explore the effects of our annealing process.
|
7 |
|
8 |
+
Note: the current Huggingface implementation of Zamba performs slower than our internal implementation. We are working to fix this with the Huggingface team.
|
9 |
+
|
10 |
## Quick start
|
11 |
|
12 |
### Presequities
|