sliding_window appears to be None. TypeError: bad operand type for unary -: 'NoneType'
Error when running the model card code:
File .../lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:88, in _make_sliding_window_causal_mask(input_ids_shape, dtype, device, past_key_values_length, sliding_window)
86 mask = torch.tril(tensor, diagonal=0)
87 # make the mask banded to account for sliding window
---> 88 mask = torch.triu(mask, diagonal=-sliding_window)
89 mask = torch.log(mask).to(dtype)
91 if past_key_values_length > 0:
TypeError: bad operand type for unary -: 'NoneType'
Solved with:
ckpt = "mistralai/Mistral-7B-Instruct-v0.2"
config = AutoConfig.from_pretrained(ckpt)
config.update({'sliding_window': 4096})
model = AutoModelForCausalLM.from_pretrained(ckpt, device_map="auto", config=config)
tokenizer = AutoTokenizer.from_pretrained(ckpt, config=config)
you can also just add the option in from_pretrained (instead of using AutoConfig just for that aim): AutoModelForCausalLM.from_pretrained(ckpt, device_map="auto", sliding_window=4096)
you can also just add the option in from_pretrained (instead of using AutoConfig just for that aim): AutoModelForCausalLM.from_pretrained(ckpt, device_map="auto", sliding_window=4096)
However it says that v0.2 doesn't use sliding-window-attention, should we set sliding_window=4096, or just set sliding_window=32k