Back to all models
fill-mask mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint  

⚡️ Upgrade your account to access the Inference API

							curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
Share Copied link to clipboard

Monthly model downloads

ixa-ehu/berteus-base-cased ixa-ehu/berteus-base-cased
last 30 days



Contributed by

ixa-ehu ixa taldea university
2 models

How to use this model directly from the 🤗/transformers library:

Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("ixa-ehu/berteus-base-cased") model = AutoModelWithLMHead.from_pretrained("ixa-ehu/berteus-base-cased")

BERTeus base cased

This is the Basque language pretrained model presented in Give your Text Representation Models some Love: the Case for Basque. This model has been trained on a Basque corpus comprising Basque crawled news articles from online newspapers and the Basque Wikipedia. The training corpus contains 224.6 million tokens, of which 35 million come from the Wikipedia.

BERTeus has been tested on four different downstream tasks for Basque: part-of-speech (POS) tagging, named entity recognition (NER), sentiment analysis and topic classification; improving the state of the art for all tasks. See summary of results below:

Downstream task BERTeus mBERT Previous SOTA
Topic Classification 76.77 68.42 63.00
Sentiment 78.10 71.02 74.02
POS 97.76 96.37 96.10
NER 87.06 81.52 76.72

If using this model, please cite the following paper:

  title={Give your Text Representation Models some Love: the Case for Basque},
  author={Rodrigo Agerri and I{\~n}aki San Vicente and Jon Ander Campos and Ander Barrena and Xabier Saralegi and Aitor Soroa and Eneko Agirre},
  booktitle={Proceedings of the 12th International Conference on Language Resources and Evaluation},