ssmits
/

Falcon2-5.5B-Dutch

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ssmits commited on Jun 5

Commit

f765d7a

•

1 Parent(s): 03e87f8

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,9 +13,9 @@ language:
 ## Why prune?
-Falcon-11B is still undertrained, as can be seen by this graph:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/QeaL9bOrPskustzFpjMUP.png)
-This is why the choice is made by prune 50% of the layers.
 Note that \~1B of continued pre-training (\~1M rows of 1k tokens) is still required to restore the perplexity of this model in the desired language.
 I'm planning on doing that for certain languages, depending on how much compute will be available.

 ## Why prune?
+Even though [Falcon-11B](https://huggingface.co/tiiuae/falcon-11B) is trained on 5T tokens, it is still undertrained, as can be seen by this graph:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/QeaL9bOrPskustzFpjMUP.png)
+This is why the choice is made to prune 50% of the layers.
 Note that \~1B of continued pre-training (\~1M rows of 1k tokens) is still required to restore the perplexity of this model in the desired language.
 I'm planning on doing that for certain languages, depending on how much compute will be available.