Back to all models

Unable to determine this model’s pipeline type. Check the docs .

Monthly model downloads

dumitrescustefan/bert-base-romanian-uncased-v1 dumitrescustefan/bert-base-romanian-uncased-v1
205 downloads
last 30 days

pytorch

tf

Contributed by

dumitrescustefan Dumitrescu Stefan
2 models

How to use this model directly from the 🤗/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("dumitrescustefan/bert-base-romanian-uncased-v1") model = AutoModel.from_pretrained("dumitrescustefan/bert-base-romanian-uncased-v1")

bert-base-romanian-uncased-v1

The BERT base, uncased model for Romanian, trained on a 15GB corpus, version v1.0

How to use

from transformers import AutoTokenizer, AutoModel
import torch

# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("dumitrescustefan/bert-base-romanian-uncased-v1", do_lower_case=True)
model = AutoModel.from_pretrained("dumitrescustefan/bert-base-romanian-uncased-v1")

# tokenize a sentence and run through the model
input_ids = torch.tensor(tokenizer.encode("Acesta este un test.", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids)

# get encoding
last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple

Evaluation

Evaluation is performed on Universal Dependencies Romanian RRT UPOS, XPOS and LAS, and on a NER task based on RONEC. Details, as well as more in-depth tests not shown here, are given in the dedicated evaluation page.

The baseline is the Multilingual BERT model bert-base-multilingual-(un)cased, as at the time of writing it was the only available BERT model that works on Romanian.

Model UPOS XPOS NER LAS
bert-base-multilingual-uncased 97.65 95.72 83.91 87.65
bert-base-romanian-uncased-v1 98.18 96.84 85.26 89.61

Corpus

The model is trained on the following corpora (stats in the table below are after cleaning):

Corpus Lines(M) Words(M) Chars(B) Size(GB)
OPUS 55.05 635.04 4.045 3.8
OSCAR 33.56 1725.82 11.411 11
Wikipedia 1.54 60.47 0.411 0.4
Total 90.15 2421.33 15.867 15.2

Acknowledgements

  • We'd like to thank Sampo Pyysalo from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!