--- inference: false language: - bg license: mit datasets: - oscar - chitanka - wikipedia tags: - torch --- # BERT BASE (cased) finetuned on Bulgarian natural-language-inference data Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in [this paper](https://arxiv.org/abs/1810.04805) and first released in [this repository](https://github.com/google-research/bert). This model is cased: it does make a difference between bulgarian and Bulgarian. The training data is Bulgarian text from [OSCAR](https://oscar-corpus.com/post/oscar-2019/), [Chitanka](https://chitanka.info/) and [Wikipedia](https://bg.wikipedia.org/). It was finetuned on private NLI Bulgarian data. Then, it was compressed via [progressive module replacing](https://arxiv.org/abs/2002.02925). ### How to use Here is how to use this model in PyTorch: ```python >>> import torch >>> from transformers import AutoModelForSequenceClassification, AutoTokenizer >>> >>> model_id = 'rmihaylov/bert-base-nli-theseus-bg' >>> model = AutoModelForSequenceClassification.from_pretrained(model_id) >>> tokenizer = AutoTokenizer.from_pretrained(model_id) >>> >>> inputs = tokenizer.encode_plus( >>> 'Няколко момчета играят футбол.', >>> 'Няколко момичета играят футбол.', >>> return_tensors='pt') >>> >>> outputs = model(**inputs) >>> contradiction, entailment, neutral = torch.softmax(outputs[0][0], dim=0).detach() >>> contradiction, neutral, entailment (tensor(0.9998), tensor(0.0001), tensor(5.9929e-05)) ```