mistralai/Mistral-7B-v0.1 · Unable to inference beyond sliding window length

Using the following config:

model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1",
torch_dtype=torch.float16,
device_map="auto",
use_flash_attention_2=True,
).to(device)

Leads to error:

File "/home/andreas/.local/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 408, in forward
raise ValueError(
ValueError: past key much have a shape of (1, 32, 4095, 128), got torch.Size([1, 8, 4094, 128])

Due to mismatch between the window and kv_cache lengths