wietsedv
/

bert-base-dutch-cased

Fill-Mask Transformers PyTorch TensorFlow JAX Safetensors bert Inference Endpoints

Model card Files Files and versions Community

bert-base-dutch-cased / tokenizer_config.json

Wietse de Vries

add missing char tokens to vocab (with embeddings close to [UNK])

ff8ab2f almost 3 years ago

raw history blame contribute delete

No virus

236 Bytes

	{
	"do_lower_case": false,
	"unk_token": "[UNK]",
	"sep_token": "[SEP]",
	"pad_token": "[PAD]",
	"cls_token": "[CLS]",
	"mask_token": "[MASK]",
	"tokenize_chinese_chars": true,
	"strip_accents": null,
	"model_max_length": 512
	}