HiTZ
/

latxa-70b-v1.1

@@ -24,7 +24,7 @@ model-index:
           value: 69.76
       source:
         name: Paper
-        url: https://paper-url.com
     - task:
         type: multiple-choice
       dataset:
@@ -36,7 +36,7 @@ model-index:
           value: 64.89
       source:
         name: Paper
-        url: https://paper-url.com
     - task:
         type: mix
       dataset:
@@ -48,7 +48,7 @@ model-index:
           value: 61.66
       source:
         name: Paper
-        url: https://paper-url.com
     - task:
         type: multiple_choice
       dataset:
@@ -60,7 +60,7 @@ model-index:
           value: 60.61
       source:
         name: Paper
-        url: https://paper-url.com
     - task:
         type: multiple_choice
       dataset:
@@ -72,7 +72,7 @@ model-index:
           value: 53.69
       source:
         name: Paper
-        url: https://paper-url.com
     - task:
         type: multiple_choice
       dataset:
@@ -84,7 +84,7 @@ model-index:
           value: 61.52
       source:
         name: Paper
-        url: https://paper-url.com
     - task:
         type: multiple_choice
       dataset:
@@ -96,7 +96,7 @@ model-index:
           value: 54.48
       source:
         name: Paper
-        url: https://paper-url.com
 ---
 # **Model Card for Latxa 70b**
@@ -105,6 +105,8 @@ model-index:
   <img src="https://github.com/hitz-zentroa/latxa/blob/b9aa705f60ee2cc03c9ed62fda82a685abb31b07/assets/latxa_round.png?raw=true" style="height: 350px;">
 </p>
 We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque corpus comprising 4.3M documents and 4.2B tokens. In our extensive evaluation, Latxa outperforms all previous open models we compare to by a large margin. In addition, it is competitive with GPT-4 Turbo in language proficiency and understanding, despite lagging behind in reading comprehension and knowledgeintensive tasks. Both the Latxa family of models, as well as our new pretraining corpora and evaluation datasets, are publicly available under open licenses. Our suite enables reproducible research on methods to build LLMs for low-resource languages
 - 📒 Blog Post: [Latxa: An Open Language Model and Evaluation Suite for Basque](https://www.hitz.eus/en/node/340)

           value: 69.76
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
     - task:
         type: multiple-choice
       dataset:
           value: 64.89
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
     - task:
         type: mix
       dataset:
           value: 61.66
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
     - task:
         type: multiple_choice
       dataset:
           value: 60.61
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
     - task:
         type: multiple_choice
       dataset:
           value: 53.69
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
     - task:
         type: multiple_choice
       dataset:
           value: 61.52
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
     - task:
         type: multiple_choice
       dataset:
           value: 54.48
       source:
         name: Paper
+        url: https://arxiv.org/abs/2403.20266
 ---
 # **Model Card for Latxa 70b**
   <img src="https://github.com/hitz-zentroa/latxa/blob/b9aa705f60ee2cc03c9ed62fda82a685abb31b07/assets/latxa_round.png?raw=true" style="height: 350px;">
 </p>
+<span style="color: red; font-weight: bold">IMPORTANT:</span> This model is outdated and made available publicly for reproducibility purposes only. Please utilize the most recent version found in [our HuggingFace collection](https://huggingface.co/collections/HiTZ/latxa-65a697e6838b3acc53677304).
 We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque corpus comprising 4.3M documents and 4.2B tokens. In our extensive evaluation, Latxa outperforms all previous open models we compare to by a large margin. In addition, it is competitive with GPT-4 Turbo in language proficiency and understanding, despite lagging behind in reading comprehension and knowledgeintensive tasks. Both the Latxa family of models, as well as our new pretraining corpora and evaluation datasets, are publicly available under open licenses. Our suite enables reproducible research on methods to build LLMs for low-resource languages
 - 📒 Blog Post: [Latxa: An Open Language Model and Evaluation Suite for Basque](https://www.hitz.eus/en/node/340)