utter-project
/

EuroLLM-1.7B-Instruct

@@ -1,59 +1,166 @@
 ---
-library_name: transformers
-base_model: Unbabel/EuroLLM-1B-annealed-fw
-tags:
-- generated_from_trainer
-model-index:
-- name: experiments/EuroLLM/EuroLLM-1B-annealed-fw-euroblocks-v0.1
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
-# experiments/EuroLLM/EuroLLM-1B-annealed-fw-euroblocks-v0.1
-This model is a fine-tuned version of [Unbabel/EuroLLM-1B-annealed-fw](https://huggingface.co/Unbabel/EuroLLM-1B-annealed-fw) on the None dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 7e-06
-- train_batch_size: 2
-- eval_batch_size: 1
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 2
-- gradient_accumulation_steps: 24
-- total_train_batch_size: 96
-- total_eval_batch_size: 2
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 300
-- num_epochs: 4
-### Training results
-### Framework versions
-- Transformers 4.44.2
-- Pytorch 2.2.2+cu121
-- Datasets 2.19.1
-- Tokenizers 0.19.1

 ---
+license: apache-2.0
+language:
+- en
+- de
+- es
+- fr
+- it
+- pt
+- pl
+- nl
+- tr
+- sv
+- cs
+- el
+- hu
+- ro
+- fi
+- uk
+- sl
+- sk
+- da
+- lt
+- lv
+- et
+- bg
+- no
+- ca
+- hr
+- ga
+- mt
+- gl
+- zh
+- ru
+- ko
+- ja
+- ar
+- hi
 ---
+## *Model updated on September 24*
+# Model Card for EuroLLM-1.7B-Instruct
+This is the model card for the first instruction tuned model of the EuroLLM series: EuroLLM-1.7B-Instruct. You can also check the pre-trained version: [EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B).
+- **Developed by:** Unbabel, Instituto Superior Técnico, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
+- **Funded by:** European Union.
+- **Model type:** A 1.7B parameter instruction tuned multilingual transfomer LLM.
+- **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
+- **License:** Apache License 2.0.
+## Model Details
+The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
+EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
+EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.
+### Model Description
+EuroLLM uses a standard, dense Transformer architecture:
+- We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
+- We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
+- We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
+- We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
+For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
+Here is a summary of the model hyper-parameters:
+|                                      |                      |
+|--------------------------------------|----------------------|
+| Sequence Length                      | 4,096                |
+| Number of Layers                     | 24                   |
+| Embedding Size                       | 2,048                |
+| FFN Hidden Size                      | 5,632                |
+| Number of Heads                      | 16                   |
+| Number of KV Heads (GQA)             | 8                    |
+| Activation Function                  | SwiGLU               |
+| Position Encodings                   | RoPE (\Theta=10,000) |
+| Layer Norm                           | RMSNorm              |
+| Tied Embeddings                      | No                   |
+| Embedding Parameters                 | 0.262B               |
+| LM Head Parameters                   | 0.262B               |
+| Non-embedding Parameters             | 1.133B               |
+| Total Parameters                     | 1.657B               |
+## Run the model
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    model_id = "utter-project/EuroLLM-1.7B-Instruct"
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    model = AutoModelForCausalLM.from_pretrained(model_id)
+    text = '<|im_start|>system\n<|im_end|>\n<|im_start|>user\nTranslate the following English source text to Portuguese:\nEnglish: I am a language model for european languages. \nPortuguese: <|im_end|>\n<|im_start|>assistant\n'
+    inputs = tokenizer(text, return_tensors="pt")
+    outputs = model.generate(**inputs, max_new_tokens=20)
+    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+## Results
+### Machine Translation
+We evaluate EuroLLM-1.7B-Instruct on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with [Gemma-2B](https://huggingface.co/google/gemma-2b) and [Gemma-7B](https://huggingface.co/google/gemma-7b) (also instruction tuned on EuroBlocks).
+The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.
+#### Flores-200
+| Model                          | AVG  | AVG en-xx | AVG xx-en | en-ar | en-bg | en-ca | en-cs | en-da | en-de | en-el | en-es-latam | en-et | en-fi | en-fr | en-ga | en-gl | en-hi | en-hr | en-hu | en-it | en-ja | en-ko | en-lt | en-lv | en-mt | en-nl | en-no | en-pl | en-pt-br | en-ro | en-ru | en-sk | en-sl | en-sv | en-tr | en-uk | en-zh-cn | ar-en | bg-en | ca-en | cs-en | da-en | de-en | el-en | es-latam-en | et-en | fi-en | fr-en | ga-en | gl-en | hi-en | hr-en | hu-en | it-en | ja-en | ko-en | lt-en | lv-en | mt-en | nl-en | no-en | pl-en | pt-br-en | ro-en | ru-en | sk-en | sl-en | sv-en | tr-en | uk-en | zh-cn-en |
+|--------------------------------|------|-----------|-----------|-------|-------|-------|-------|-------|-------|-------|--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|----------|
+| EuroLLM-1.7B-Instruct     |86.89 |	86.53 |	87.25 | 85.17 | 89.42 | 84.72 | 89.13 | 89.47 | 86.90 | 87.60 | 86.29       | 88.95 | 89.40 | 87.69 | 74.89 | 86.41 | 76.92 | 84.79 | 86.78 | 88.17 | 89.76 | 87.70 | 87.27 | 87.62 | 67.84 | 87.10 | 90.00 | 88.18 | 89.29    | 89.49 | 88.32 | 88.18 | 86.85 | 90.00 | 87.31 | 87.89 | 86.60    | 86.34 | 87.45 | 87.57 | 87.95 | 89.72 | 88.80 | 87.00 | 86.77         | 88.34 | 89.09 | 88.95 | 82.69 | 87.80 | 88.37 | 86.71 | 87.20 | 87.81 | 86.79 | 86.79 | 85.62 | 86.48 | 81.10 | 86.97 | 90.25    | 85.75 | 89.20 | 88.88 | 86.00 | 87.38 | 86.76 | 89.61 | 87.94    |
+| Gemma-2B-EuroBlocks       | 81.59 |	78.97 |	84.21 | 76.68 | 82.73 | 83.14 | 81.63 | 84.63 | 83.15 | 79.42 | 84.05       | 72.58 | 79.73 | 84.97 | 40.50 | 82.13 | 67.79 | 80.53 | 78.36 | 84.90 | 87.43 | 82.98 | 72.29 | 68.68 | 58.55 | 83.13 | 86.15 | 82.78 | 86.79    | 83.14 | 84.61 | 78.18 | 75.37 | 80.89 | 78.38 | 84.38 | 84.35    | 83.88 | 85.77 | 86.85 | 86.31 | 88.24 | 88.12 | 84.79 | 84.90         | 82.51 | 86.32 | 88.29 | 54.78 | 86.53 | 85.83 | 85.41 | 85.18 | 86.77 | 85.78 | 84.99 | 81.65 | 81.78 | 67.27 | 85.92 | 89.07    | 84.14 | 88.07 | 87.17 | 85.23 | 85.09 | 83.95 | 87.57 | 84.77    |
+| Gemma-7B-EuroBlocks       |85.27 |	83.90 |	86.64 | 86.38 | 87.87 | 85.74 | 84.25 | 85.69       | 81.49 | 85.52 | 86.93 | 62.83 | 84.96 | 75.34 | 84.93 | 83.91 | 86.92 | 88.19 | 86.11 | 81.73 | 80.55 | 66.85 | 85.31 | 89.36 | 85.87 | 88.62    | 88.06 | 86.67 | 84.79 | 82.71 | 86.45 | 85.19 | 86.67 | 85.77    | 86.36 | 87.21 | 88.09 | 87.17 | 89.40 | 88.26 | 86.74 | 86.73         | 87.25 | 88.87 | 88.81 | 72.45 | 87.62 | 87.86 | 87.08 | 87.01 | 87.58 | 86.92 | 86.70 | 85.10 | 85.74 | 77.81 | 86.83 | 90.40    | 85.41 | 89.04 | 88.77 | 86.13 | 86.67 | 86.32 | 89.27 | 87.92    |
+#### WMT-23
+| Model                          | AVG  | AVG en-xx | AVG xx-en | AVG xx-xx | en-de | en-cs | en-uk | en-ru | en-zh-cn | de-en | uk-en | ru-en | zh-cn-en | cs-uk |
+|--------------------------------|------|-----------|-----------|-----------|-------|-------|-------|-------|----------|-------|-------|-------|----------|-------|
+| EuroLLM-1.7B-Instruct   | 82.91     | 83.20     | 81.77     | 86.82     | 81.56     | 85.23     | 81.30     | 82.47     | 83.61     | 85.03      | 84.06      | 85.25      | 81.31      | 78.83      | 79.42      | 86.82      |
+| Gemma-2B-EuroBlocks       | 79.96     | 79.01     | 80.86     | 81.15     | 76.82     | 76.05     | 77.92     | 78.98     | 81.58     | 82.73      | 82.71      | 83.99      | 80.35      | 78.27      | 78.99      | 81.15      |
+| Gemma-7B-EuroBlocks       | 82.76     | 82.26     | 82.70     | 85.98     | 81.37     | 82.42     | 81.54     | 82.18     | 82.90     | 83.17      | 84.29      | 85.70      | 82.46      | 79.73      | 81.33      | 85.98      |
+#### WMT-24
+| Model | AVG | AVG en-xx | AVG xx-xx | en-de | en-es-latam | en-cs | en-ru | en-uk | en-ja | en-zh-cn | en-hi | cs-uk | ja-zh-cn |
+|---------|------|------|-------|----|---|-------|-------|--------|--------|-------|-------|-------|-----|
+| EuroLLM-1.7B-Instruct|79.32     | 79.32     | 79.34     | 79.42     | 80.67     | 80.55     | 78.65     | 80.12     | 82.96     | 80.60      | 71.59      | 83.48      | 75.20      |
+|Gemma-2B-EuroBlocks| 74.72     | 74.41     | 75.97     | 74.93     | 78.81     | 70.54     | 74.90     | 75.84     | 79.48     | 78.06      | 62.70      | 79.87      | 72.07      |
+|Gemma-7B-EuroBlocks| 78.67     | 78.34     | 80.00     | 78.88     | 80.47     | 78.55     | 78.55     | 80.12     | 80.55     | 78.90      | 70.71      | 84.33      | 75.66      |
+### General Benchmarks
+We also compare EuroLLM-1.7B with [TinyLlama-v1.1](https://huggingface.co/TinyLlama/TinyLlama_v1.1) and [Gemma-2B](https://huggingface.co/google/gemma-2b) on 3 general benchmarks: Arc Challenge and Hellaswag.
+For the non-english languages we use the [Okapi](https://aclanthology.org/2023.emnlp-demo.28.pdf) datasets.
+Results show that EuroLLM-1.7B is superior to TinyLlama-v1.1 and similar to Gemma-2B on Hellaswag but worse on Arc Challenge. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).
+#### Arc Challenge
+| Model              | Average | English | German | Spanish | French | Italian | Portuguese | Chinese | Russian | Dutch | Arabic | Swedish | Hindi  | Hungarian | Romanian | Ukrainian | Danish | Catalan |
+|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|---------|-------|--------|---------|--------|-----------|----------|-----------|--------|---------|
+| EuroLLM-1.7B | 0.3496    | 0.4061    | 0.3464    | 0.3684    | 0.3627    | 0.3738    | 0.3855    | 0.3521    | 0.3208    | 0.3507     | 0.3045     | 0.3605     | 0.2928     | 0.3271     | 0.3488     | 0.3516     | 0.3513     | 0.3396     |
+| TinyLlama-v1.1       | 0.2650    | 0.3712  | 0.2524  | 0.2795  | 0.2883  | 0.2652  | 0.2906  | 0.2410  | 0.2669   | 0.2404   | 0.2310   | 0.2687   | 0.2354   | 0.2449   | 0.2476   | 0.2524   | 0.2494   | 0.2796   |
+| Gemma-2B             | 0.3617    | 0.4846  | 0.3755  | 0.3940  | 0.4080  | 0.3687  | 0.3872  | 0.3726  | 0.3456   | 0.3328   | 0.3122   | 0.3519   | 0.2851   | 0.3039   | 0.3590   | 0.3601   | 0.3565   | 0.3516   |
+#### Hellaswag
+| Model              | Average | English | German | Spanish | French | Italian | Portuguese | Russian | Dutch  | Arabic | Swedish | Hindi  | Hungarian | Romanian | Ukrainian | Danish | Catalan |
+|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|--------|--------|---------|--------|-----------|----------|-----------|--------|---------|
+| EuroLLM-1.7B | 0.4744  | 0.4760    | 0.6057    | 0.4793    | 0.5337    | 0.5298    | 0.5085    | 0.5224    | 0.4654    | 0.4949    | 0.4104     | 0.4800     | 0.3655     | 0.4097     | 0.4606     | 0.436      | 0.4702     | 0.4445     |
+| TinyLlama-v1.1       |0.3674   | 0.6248  | 0.3650  | 0.4137  | 0.4010  | 0.3780  | 0.3892  | 0.3494  | 0.3588   | 0.2880   | 0.3561   | 0.2841   | 0.3073   | 0.3267   | 0.3349   | 0.3408   | 0.3613   |
+| Gemma-2B             |0.4666  | 0.7165  | 0.4756  | 0.5414  | 0.5180  | 0.4841  | 0.5081  | 0.4664  | 0.4655   | 0.3868   | 0.4383   | 0.3413   | 0.3710   | 0.4316   | 0.4291   | 0.4471   | 0.4448   |
+## Bias, Risks, and Limitations
+EuroLLM-1.7B-Instruct has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).