Text Generation
Transformers
Safetensors
English
falcon_mamba
Inference Endpoints
4-bit precision
bitsandbytes
JingweiZuo commited on
Commit
00f217f
1 Parent(s): cd26dd8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -91,7 +91,7 @@ print(tokenizer.decode(outputs[0]))
91
 
92
  ## Training Data
93
 
94
- Falcon-Mamba has been trained with ~ 6,000 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
95
  Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length training from 2,048 up to 8,192.
96
  Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
97
  At the last training stage, small portion of high-quality curated data was used to further enhance performance.
 
91
 
92
  ## Training Data
93
 
94
+ Falcon-Mamba has been trained with ~ 5,500 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
95
  Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length training from 2,048 up to 8,192.
96
  Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
97
  At the last training stage, small portion of high-quality curated data was used to further enhance performance.