tiiuae
/

falcon-mamba-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

JingweiZuo commited on Jul 24

Commit

02688fc

•

1 Parent(s): 124f971

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -154,8 +154,8 @@ print(tokenizer.decode(outputs[0]))
 ## Training Data
 Falcon-Mamba has been trained with ~ 5,500 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
-Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length (from 2,048 to 8,192).
-Moreover, inspired by the Curriculum Learning concept, we carefully choose data mixtures along the training stages, on both data diversity and complexity.
 Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
 At the last training stage, small portion of high-quality curated data was used to further enhance performance.

 ## Training Data
 Falcon-Mamba has been trained with ~ 5,500 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
+Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length from 2,048 to 8,192.
+Moreover, inspired by the concept of Curriculum Learning, we carefully selected data mixtures throughout the training stages, considering both data diversity and complexity.
 Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
 At the last training stage, small portion of high-quality curated data was used to further enhance performance.