julien-c HF staff commited on
Commit
59e1725
1 Parent(s): 3789c08

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/allegro/herbert-large-cased/README.md

Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: pl
3
+ tags:
4
+ - herbert
5
+ license: cc-by-sa-4.0
6
+ ---
7
+ # HerBERT
8
+ **[HerBERT](https://en.wikipedia.org/wiki/Zbigniew_Herbert)** is a BERT-based Language Model trained on Polish Corpora
9
+ using MLM and SSO objectives with dynamic masking of whole words.
10
+ Model training and experiments were conducted with [transformers](https://github.com/huggingface/transformers) in version 2.9.
11
+
12
+ ## Tokenizer
13
+ The training dataset was tokenized into subwords using ``CharBPETokenizer`` a character level byte-pair encoding with
14
+ a vocabulary size of 50k tokens. The tokenizer itself was trained with a [tokenizers](https://github.com/huggingface/tokenizers) library.
15
+ We kindly encourage you to use the **Fast** version of tokenizer, namely ``HerbertTokenizerFast``.
16
+
17
+ ## HerBERT usage
18
+
19
+
20
+ Example code:
21
+ ```python
22
+ from transformers import AutoTokenizer, AutoModel
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-large-cased")
25
+ model = AutoModel.from_pretrained("allegro/herbert-large-cased")
26
+
27
+ output = model(
28
+ **tokenizer.batch_encode_plus(
29
+ [
30
+ (
31
+ "A potem szedł środkiem drogi w kurzawie, bo zamiatał nogami, ślepy dziad prowadzony przez tłustego kundla na sznurku.",
32
+ "A potem leciał od lasu chłopak z butelką, ale ten ujrzawszy księdza przy drodze okrążył go z dala i biegł na przełaj pól do karczmy."
33
+ )
34
+ ],
35
+ padding='longest',
36
+ add_special_tokens=True,
37
+ return_tensors='pt'
38
+ )
39
+ )
40
+ ```
41
+
42
+
43
+ ## License
44
+ CC BY-SA 4.0
45
+
46
+
47
+ ## Authors
48
+ Model was trained by **Allegro Machine Learning Research** team.
49
+
50
+ You can contact us at: <a href="mailto:klejbenchmark@allegro.pl">klejbenchmark@allegro.pl</a>