Model Card for HiTZ/Latxa-Llama-3.1-70B-Instruct-v2

We introduce Latxa 3.1 70B Instruct V2, an instructed version of Latxa. This new Latxa is based on Llama-3.1 (Instruct), which we trained on an extended Basque corpus (Etxaniz et al., 2024).

Our experimentation shows that Latxa 3.1 70B Instruct outperforms Llama-3.1-Instruct by a large margin on Basque standard benchmarks, and particularly, on chat conversations. In addition, we organized a public arena-based evaluation, on which Latxat competed against other baselines and proprietary models such as GPT-4o and Claude Sonnet. The results showed that Latxa ranked 3rd, just behind Claude and GPT-4 and above all the other same-size competitors. The official paper is coming soon.

Model Details

Model Description

Latxa is a family of Large Language Models (LLM) based on Meta’s LLaMA models. Current LLMs exhibit incredible performance for high-resource languages such as English, but, in the case of Basque and other low-resource languages, their performance is close to a random guesser. These limitations widen the gap between high- and low-resource languages when it comes to digital development. We present Latxa to overcome these limitations and promote the development of LLM-based technology and research for the Basque language. Latxa models follow the same architecture as their original counterparts and were further trained in Latxa Corpus v1.1, a high-quality Basque corpora.

  • Developed by: HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU)
  • Model type: Language model
  • Language(s) (NLP): eu
  • License: llama3.1
  • Parent model: meta-llama/Llama-3.1-70B-Instruct
  • Contact: hitz@ehu.eus

Getting Started

Use the code below to get started with the model.

from transformers import pipeline

pipe = pipeline('text-generation', model='HiTZ/Latxa-Llama-3.1-70B-Instruct')

messages = [
    {'role': 'user', 'content': 'Kaixo!'},
]

pipe(messages)

>>
[
  {
    'generated_text': [
      {'role': 'user', 'content': 'Kaixo!'},
      {'role': 'assistant', 'content': 'Kaixo! Zer moduz? Zer behar edo galdetu nahi duzu?'}
    ]
  }
]

Uses

Latxa models are intended to be used with Basque data; for any other language the performance is not guaranteed. Same as the original, Latxa inherits the Llama-3.1 License which allows for commercial and research use.

Direct Use

Latxa Instruct models are trained to follow instructions or to work as chat assistants.

Out-of-Scope Use

The model is not intended for malicious activities, such as harming others or violating human rights. Any downstream application must comply with current laws and regulations. Irresponsible usage in production environments without proper risk assessment and mitigation is also discouraged.

Bias, Risks, and Limitations

In an effort to alleviate the potentially disturbing or harmful content, Latxa has been trained on carefully selected and processed data which comes mainly from local media, national/regional newspapers, encyclopedias and blogs (see Latxa Corpus v1.1). Still, the model is based on Llama 3.1 models and can potentially carry the same bias, risk and limitations. Please see the Llama’s Ethical Considerations and Limitations for further information.

Training Details

DISCLAIMER

Further training details will be released with the corresponding research paper in the near future.

Evaluation

We evaluated the models 5-shot settings on multiple-choice tasks. We used the basque partitions of each dataset.

The arena results will be released in the future.

Testing Data, Factors & Metrics

Testing Data

  • Belebele (Bandarkar et al.): Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. We evaluated the model in a 5-shot fashion.
  • X-StoryCloze (Lin et al.): XStoryCloze consists of the professionally translated version of the English StoryCloze dataset to 10 non-English languages. Story Cloze is a commonsense reasoning dataset which consists of choosing the correct ending to a four-sentence story. We evaluated the model in a 5-shot fashion.
  • EusProficiency (Etxaniz et al., 2024): EusProficiency comprises 5,169 exercises on different topics from past EGA exams, the official C1-level certificate of proficiency in Basque.
  • EusReading (Etxaniz et al., 2024): EusReading consists of 352 reading comprehension exercises (irakurmena) sourced from the same set of past EGA exams.
  • EusTrivia (Etxaniz et al., 2024): EusTrivia consists of 1,715 trivia questions from multiple online sources. 56.3% of the questions are elementary level (grades 3-6), while the rest are considered challenging.
  • EusExams (Etxaniz et al., 2024): EusExams is a collection of tests designed to prepare individuals for Public Service examinations conducted by several Basque institutions, including the public health system Osakidetza, the Basque Government, the City Councils of Bilbao and Gasteiz, and the University of the Basque Country (UPV/EHU).

Metrics

We use Accuracy, as they are framed as Multiple Choice questions.

Results

Task Llama-3.1 8B Ins. Latxa 3.1 8B Ins. Llama-3.1 70B Ins. Latxa 3.1 70B Ins. Latxa 3.1 70B Ins. V2
Belebele 73.89 80.00 89.11 91.00 90.7
X-Story Cloze 61.22 71.34 69.69 77.83 78.66
EusProficiency 34.13 52.83 43.59 68.00 73.30
EusReading 49.72 62.78 72.16 78.98 -
EusTrivia 45.01 61.05 62.51 74.17 75.10
EusExams 46.21 56.00 63.28 71.56 73.40

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: HPC Cluster, 4 x A100 64Gb nodes x64
  • Hours used (total GPU hours): 16005.12h
  • Cloud Provider: CINECA HPC
  • Compute Region: Italy
  • Carbon Emitted: 1901.41kg CO2 eq

Citation

To cite our work, please use:

@inproceedings{sainz-etal-2025-instructing,
    title = "Instructing Large Language Models for Low-Resource Languages: A Systematic Study for {B}asque",
    author = "Sainz, Oscar  and
      Perez, Naiara  and
      Etxaniz, Julen  and
      Fernandez de Landa, Joseba  and
      Aldabe, Itziar  and
      Garc{\'i}a-Ferrero, Iker  and
      Zabala, Aimar  and
      Azurmendi, Ekhi  and
      Rigau, German  and
      Agirre, Eneko  and
      Artetxe, Mikel  and
      Soroa, Aitor",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1484/",
    doi = "10.18653/v1/2025.emnlp-main.1484",
    pages = "29136--29160",
    ISBN = "979-8-89176-332-6",
    abstract = "Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to conventional instruction adaptation pipelines in low-resource scenarios. We assume a realistic scenario for low-resource languages, where only the following are available: corpora in the target language, existing open-weight multilingual base and instructed backbone LLMs, and synthetically generated instructions sampled from the instructed backbone. We present a comprehensive set of experiments for Basque that systematically study different combinations of these components evaluated on benchmarks and human preferences from 1,680 participants. Our conclusions show that target language corpora are essential, with synthetic instructions yielding robust models, and, most importantly, that using as backbone an instruction-tuned model outperforms using a base non-instructed model. Scaling up to Llama 3.1 Instruct 70B as backbone, our model comes near frontier models of much larger sizes for Basque, without using any Basque instructions. We release code, models, instruction datasets, and human preferences to support full reproducibility in future research on low-resource language adaptation."
}

Acknowledgements

This work has been partially supported by the Basque Government (IKER-GAITU project).

It has also been partially supported by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project with reference 2022/TL22/00215335.

The models were trained on the Leonardo supercomputer at CINECA under the EuroHPC Joint Undertaking, project EHPC-EXT-2023E01-013.

Downloads last month
45
Safetensors
Model size
71B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HiTZ/Latxa-Llama-3.1-70B-Instruct-v2

Finetuned
(91)
this model
Quantizations
2 models

Dataset used to train HiTZ/Latxa-Llama-3.1-70B-Instruct-v2

Collection including HiTZ/Latxa-Llama-3.1-70B-Instruct-v2

Paper for HiTZ/Latxa-Llama-3.1-70B-Instruct-v2

Evaluation results