cmarkea
/

bloomz-7b1-mt-sft-chat

Text Generation

text-generation-inference

Model card Files Files and versions Community

Cyrile commited on Sep 14, 2023

Commit

000f9a7

·

1 Parent(s): f8661a4

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ Sans honneur que précaire, sans liberté que provisoire, [...], et de façon qu
 |      model     | GPT 3.5 | Boris | Flan-T5 | LLaMA | Dolly | MPT | Falcon | Bloomz |
 |:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:|
-| tokens by word |   2.3   |  2.3  |    2    |  1.9  |  1.9  | 1.9 |   1.8  |   1.4  |
 For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been a demonstrated direct relationship between the number of tokens per word required for modeling and the predictive performance of the model [1].

 |      model     | GPT 3.5 | Boris | Flan-T5 | LLaMA | Dolly | MPT | Falcon | Bloomz |
 |:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:|
+| tokens per word |   2.3   |  2.3  |    2    |  1.9  |  1.9  | 1.9 |   1.8  |   1.4  |
 For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been a demonstrated direct relationship between the number of tokens per word required for modeling and the predictive performance of the model [1].