ilsp
/

Meltemi-7B-v1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LVouk commited on Mar 26, 2024

Commit

2d6bec7

·

verified ·

1 Parent(s): 01b039d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ Meltemi is built on top of [Mistral-7B](https://huggingface.co/mistralai/Mistral
 # Model Information
 - Vocabulary extension of the Mistral-7B tokenizer with Greek tokens
-- Trained with 8k context length
 - We extend the pretraining of Mistral-7B with added proficiency for the Greek language, by utilizing a large corpus consisting of approximately **40 billion tokens**.
   * This corpus includes 28.5 billion monolingual Greek tokens, constructed from publicly available resources. Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (10.5 billion tokens) and Greek-English parallel data (600 million tokens).
   * This corpus has been processed, filtered, and deduplicated to ensure data quality (a detailed description of our data processing pipeline will be published in our upcoming paper) and is outlined below:

 # Model Information
 - Vocabulary extension of the Mistral-7B tokenizer with Greek tokens
+- 8192 context length
 - We extend the pretraining of Mistral-7B with added proficiency for the Greek language, by utilizing a large corpus consisting of approximately **40 billion tokens**.
   * This corpus includes 28.5 billion monolingual Greek tokens, constructed from publicly available resources. Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (10.5 billion tokens) and Greek-English parallel data (600 million tokens).
   * This corpus has been processed, filtered, and deduplicated to ensure data quality (a detailed description of our data processing pipeline will be published in our upcoming paper) and is outlined below: