bodo-roberta-base-sentencepiece-mlm / tokenizer_config.json
Sanjib Narzary
sentence piece tokenizer added
f123184
raw
history blame
233 Bytes
{
"clean_up_tokenization_spaces": true,
"model_max_length": 512,
"special_tokens": [
"<s>",
"<pad>",
"</s>",
"<unk>",
"<cls>",
"<sep>",
"<mask>"
],
"tokenizer_class": "PreTrainedTokenizerFast"
}