Voicelab
/

vlt5-base-keywords

Text2Text Generation

keywords-generation

text-classifiation

text-generation-inference

Model card Files Files and versions Community

AgaMiko commited on Oct 3, 2022

Commit

0a96cd7

·

1 Parent(s): 95e2874

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -64,9 +64,7 @@ The model was trained on a POSMAC corpus. Polish Open Science Metadata Corpus (P
 # Tokenizer
-As in the original HerBERT implementation, the training dataset was tokenized into subwords using a character level byte-pair encoding (CharBPETokenizer) with a vocabulary size of 50k tokens. The tokenizer itself was trained with a tokenizers library.
-We kindly encourage you to use the Fast version of the tokenizer, namely HerbertTokenizerFast.
 # Usage

 # Tokenizer
+As in the original plT5 implementation, the training dataset was tokenized into subwords using a sentencepiece unigram model with vocabulary size of 50k tokens.
 # Usage