Spaces:

yhavinga
/

dutch-tokenizer-arena

Runtime error

update

9495a4f about 1 year ago

324 Bytes

	from transformers import AutoTokenizer
	from vocab import TokenizerType

	tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", trust_remote_code=True)


	# byte-bpe sentencepiece
	tokenizer.type = TokenizerType.ByteBPE

	tokenizer.comments = "expand the vocqbulary size from 64000 in Baichuan1 to 125696"