Upload tokenizer.json

#3
by lostlex - opened

HuggingFace repositories store tokenizers in two flavors:

  1. "slow tokenizer" - corresponds to a tokenizer implemented in Python and stored as tokenizer_config.json
  2. "fast tokenizers" - corresponds to a tokenizer implemented in Rust and stored as tokenizer.json

This repository only include files for the slow tokenizer. While the transformers library automatically converts "slow tokenizer" to "fast tokenizer" whenever possible, Bumblebee relies on the Rust bindings and therefore always requires the tokenizer.json file. This change adds the fast tokenizer.

Generated with:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("unitary/unbiased-toxic-roberta")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment