sliding_window appears to be None. TypeError: bad operand type for unary -: 'NoneType'

#56
by narai - opened

Error when running the model card code:

File .../lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py:88, in _make_sliding_window_causal_mask(input_ids_shape, dtype, device, past_key_values_length, sliding_window)
86 mask = torch.tril(tensor, diagonal=0)
87 # make the mask banded to account for sliding window
---> 88 mask = torch.triu(mask, diagonal=-sliding_window)
89 mask = torch.log(mask).to(dtype)
91 if past_key_values_length > 0:

TypeError: bad operand type for unary -: 'NoneType'

Solved with:

ckpt = "mistralai/Mistral-7B-Instruct-v0.2"

config = AutoConfig.from_pretrained(ckpt)
config.update({'sliding_window': 4096})

model = AutoModelForCausalLM.from_pretrained(ckpt, device_map="auto", config=config)
tokenizer = AutoTokenizer.from_pretrained(ckpt, config=config)

Hi @narai
You can also solve the issue by updating transformers - pip install -U transformers

you can also just add the option in from_pretrained (instead of using AutoConfig just for that aim): AutoModelForCausalLM.from_pretrained(ckpt, device_map="auto", sliding_window=4096)

deleted

you can also just add the option in from_pretrained (instead of using AutoConfig just for that aim): AutoModelForCausalLM.from_pretrained(ckpt, device_map="auto", sliding_window=4096)

However it says that v0.2 doesn't use sliding-window-attention, should we set sliding_window=4096, or just set sliding_window=32k

Sign up or log in to comment