Generated with:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
assert tokenizer.is_fast

Looks good to me! cc @sanchit-gandhi

As discussed with @ArthurZ on the PR the fast tokenizer can always be loaded from the slow one:

So there's no issue with not having the tokenizer.json. Happy to merge this PR to improve clarity for the Hub weights however

@sanchit-gandhi yeah, the thing is that the Rust huggingface/tokenizers can only load tokenizer.json. In the Elixir ecosystem we have bindings to huggingface/tokenizers and so rely solely on fast tokenizers :)

Thanks for the explanation! Makes sense - let's merge this one then @ArthurZ @patrickvonplaten

patrickvonplaten changed pull request status to merged

Sign up or log in to comment