Training Mistral-7B-v0.1 with Sliding Window = Null

#94
by yardenhoch - opened

Hello ,

I noticed that in the recent release of Mistral-7B-Instruct-v0.2, the sliding_window parameter is set to null. I'm curious to know if it's possible to apply the same setting when training the Mistral-7B-v0.1 model.

Could you please provide some guidance on this? Is there any specific reason why sliding_window is set to null in the newer version, and what would be the implications if we apply the same setting to the older version?

Thank you in advance for your help.

Hi @yardenhoch

Thanks for the issue! You can manually set a new value for sliding_window in model's config. If you have cloned the repo locally you can modify the config file manually, otherwise you can do model.config.sliding_window = xxx before launching training

@ybelkada Thank you for your response.

To clarify, does this mean that even if my prompt is longer than 4096 tokens, all the words will be processed together when sliding_window is set to null? I'm trying to understand the implications of this setting on longer prompts.

Hi @yardenhoch

To clarify, does this mean that even if my prompt is longer than 4096 tokens, all the words will be processed together when sliding_window is set to null?

I think so yes! all tokens will be processed together in case sliding_window is set to null

Sign up or log in to comment