Zyphra
/

Zamba-7B-v1

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pglo commited on May 23, 2024

Commit

81ef6f4

·

verified ·

1 Parent(s): 7494a05

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -5,6 +5,8 @@ license: apache-2.0
 Zamba-7B-v1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 50B high-quality tokens.
 ## Quick start
 ### Presequities

 Zamba-7B-v1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 50B high-quality tokens.
+Note: the current Huggingface implementation of Zamba performs slower than our internal implementation. We are working to fix this with the Huggingface team.
 ## Quick start
 ### Presequities