More Logits Than Tokens in Vocab
#4
by
calbors
- opened
The following snippet raises an error:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "LongSafari/hyenadna-large-1m-seqlen-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
assert model.vocab_size == len(tokenizer.get_vocab())
Was a different vocab during training?