Back to all models

Unable to determine this model鈥檚 pipeline type. Check the docs .

Monthly model downloads

allegro/herbert-base-cased allegro/herbert-base-cased
67 downloads
last 30 days

pytorch

tf

Contributed by

Allegro ML Research company
2 team members 4 models

How to use this model directly from the 馃/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased") model = AutoModel.from_pretrained("allegro/herbert-base-cased")

HerBERT

HerBERT is a BERT-based Language Model trained on Polish Corpora using MLM and SSO objectives with dynamic masking of whole words. Model training and experiments were conducted with transformers in version 2.9.

Tokenizer

The training dataset was tokenized into subwords using CharBPETokenizer a character level byte-pair encoding with a vocabulary size of 50k tokens. The tokenizer itself was trained with a tokenizers library. We kindly encourage you to use the Fast version of tokenizer, namely HerbertTokenizerFast.

HerBERT usage

Example code:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
model = AutoModel.from_pretrained("allegro/herbert-base-cased")

output = model(
    **tokenizer.batch_encode_plus(
        [
            (
                "A potem szed艂 艣rodkiem drogi w kurzawie, bo zamiata艂 nogami, 艣lepy dziad prowadzony przez t艂ustego kundla na sznurku.",
                "A potem lecia艂 od lasu ch艂opak z butelk膮, ale ten ujrzawszy ksi臋dza przy drodze okr膮偶y艂 go z dala i bieg艂 na prze艂aj p贸l do karczmy."
            )
        ],
    padding='longest',
    add_special_tokens=True,
    return_tensors='pt'
    )
)

License

CC BY-SA 4.0

Authors

Model was trained by Allegro Machine Learning Research team.

You can contact us at: klejbenchmark@allegro.pl