christian-dynamofl commited on
Commit
9bd9276
1 Parent(s): aa444a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ language:
14
 
15
  # Dynamo 8B Model Card
16
 
17
- Dynamo 8B is an improvement of the Mistral-7B architecture for the purpose of multilingual language modeling. Dynamo 8B outperforms Mistral 7B, Llama2 13B, Bloom 7B, and PolyLM 13B on most multilingual benchmarks we tested (i.e. PAWS and XCOPA). For additional details, please refer to our [blog post](https://www.dynamofl.com/blogs/introducing-dynamo-8b-a-multilingual-foundation-model-for-global-enterprises).
18
 
19
  It includes an extended tokenizer that was pretrained to better leverage tokens in different languages. The tokenizer was extended by training a sentence BPE tokenizer on selected languages (200M tokens were used per language) and then combined the merges/vocab that were not already present in the Mistral tokenizer. After the tokenizers were merged, the model was pretrained with an additional 210B tokens from multilingual data like German, Spanish, Korean, Italian, and Turkish texts. The pretraining dataset also incorporated English tokens to mitigate catastrophic forgetting.
20
 
 
14
 
15
  # Dynamo 8B Model Card
16
 
17
+ Dynamo 8B is an improvement of the Mistral 7B architecture for the purpose of multilingual language modeling. Dynamo 8B outperforms Mistral 7B, Llama2 13B, Bloom 7B, and PolyLM 13B on most multilingual benchmarks we tested (i.e. PAWS and XCOPA). For additional details, please refer to our [blog post](https://www.dynamofl.com/blogs/introducing-dynamo-8b-a-multilingual-foundation-model-for-global-enterprises).
18
 
19
  It includes an extended tokenizer that was pretrained to better leverage tokens in different languages. The tokenizer was extended by training a sentence BPE tokenizer on selected languages (200M tokens were used per language) and then combined the merges/vocab that were not already present in the Mistral tokenizer. After the tokenizers were merged, the model was pretrained with an additional 210B tokens from multilingual data like German, Spanish, Korean, Italian, and Turkish texts. The pretraining dataset also incorporated English tokens to mitigate catastrophic forgetting.
20