Pipelines fail with batch > 1
Hello everyone,
Using pipelines for inference is currently broken for batch sizes greater than 1. For example, you cannot do this:
model = AutoModelForCausalLM.from_pretrained("Unbabel/TowerBase-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("Unbabel/TowerBase-7B-v0.1")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
batch_size=2
)
examples = ["English: My name is TowerBase.\nPortuguese:", "English: These are my friends, NLLB, GPT, and wFST.\nPortuguese:"]
out = pipe(examples)
The issue seems to be related to a mismatch between the vocabulary the tokenizer uses and what is expected by the model. The tokenizer uses 32004 as its pad_id, but the vocab size is only 32000. Passing a padded batch of mixed-length sequences consequently produces an indexing error in TowerBase's embedding layer.
I worked around this issue by setting tokenizer.pad_token_id = tokenizer.eos_token_id
. I assume that TowerInstruct has the same bug and short-term fix, but I haven't tested it yet.
Thanks for raising this. It should now be fixed, without needing to explicitly set tokenizer.pad_token_id = tokenizer.eos_token_id
. (https://huggingface.co/Unbabel/TowerBase-7B-v0.1/commit/2837006f6f8e9ed6e637ab8fcf9a6bf22e31e4d8)