Load in transformers library with:

from transformers import AutoTokenizer, AutoModelForMaskedLM
  tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/est-roberta", use_fast=False)
  model = AutoModelForMaskedLM.from_pretrained("EMBEDDIA/est-roberta")

NOTE: it is currently critically important to add use_fast=False parameter to tokenizer if using transformers version 4+ (prior versions have use_fast=False as default) By default it attempts to load a fast tokenizer, which might work (ie. not result in an error), but not correctly, as there is no current support for fast tokenizers for Camembert-based models.


Est-RoBERTa model is a monolingual Estonian BERT-like model. It is closely related to French Camembert model The Estonian corpora used for training the model have 2.51 billion tokens in total. The subword vocabulary contains 40,000 tokens.

Est-RoBERTa was trained for 40 epochs.

New: fine-tune this model in a few clicks by selecting AutoNLP in the "Train" menu!
Downloads last month
Hosted inference API
Mask token: <mask>
This model can be loaded on the Inference API on-demand.