AgaMiko commited on
Commit
0a96cd7
1 Parent(s): 95e2874

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -64,9 +64,7 @@ The model was trained on a POSMAC corpus. Polish Open Science Metadata Corpus (P
64
 
65
  # Tokenizer
66
 
67
- As in the original HerBERT implementation, the training dataset was tokenized into subwords using a character level byte-pair encoding (CharBPETokenizer) with a vocabulary size of 50k tokens. The tokenizer itself was trained with a tokenizers library.
68
-
69
- We kindly encourage you to use the Fast version of the tokenizer, namely HerbertTokenizerFast.
70
 
71
  # Usage
72
 
 
64
 
65
  # Tokenizer
66
 
67
+ As in the original plT5 implementation, the training dataset was tokenized into subwords using a sentencepiece unigram model with vocabulary size of 50k tokens.
 
 
68
 
69
  # Usage
70