lostlex commited on
Commit
4880856
·
1 Parent(s): 36295dd

Upload tokenizer.json

Browse files

HuggingFace repositories store tokenizers in two flavors:

1. "slow tokenizer" - corresponds to a tokenizer implemented in Python and stored as tokenizer_config.json
2. "fast tokenizers" - corresponds to a tokenizer implemented in Rust and stored as tokenizer.json

This repository only include files for the slow tokenizer. While the transformers library automatically converts "slow tokenizer" to "fast tokenizer" whenever possible, [Bumblebee](https://github.com/elixir-nx/bumblebee/) relies on the Rust bindings and therefore always requires the tokenizer.json file. This change adds the fast tokenizer.

Generated with:

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("unitary/unbiased-toxic-roberta")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")
```

Files changed (1) hide show
  1. tokenizer.json +0 -0
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff