Upload tokenizer.json
#3
by
lostlex
- opened
HuggingFace repositories store tokenizers in two flavors:
- "slow tokenizer" - corresponds to a tokenizer implemented in Python and stored as tokenizer_config.json
- "fast tokenizers" - corresponds to a tokenizer implemented in Rust and stored as tokenizer.json
This repository only include files for the slow tokenizer. While the transformers library automatically converts "slow tokenizer" to "fast tokenizer" whenever possible, Bumblebee relies on the Rust bindings and therefore always requires the tokenizer.json file. This change adds the fast tokenizer.
Generated with:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("unitary/unbiased-toxic-roberta")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")