Update config.json to accurately reflect the 32k context window.

#73

Replace the config.json to reflect the lack of a sliding context window, have the context window accurately reflect the 32k context window, and update to the latest transformers version.

The original config is correct.

The original config is correct.

Then why can't I use all 32k context? I have SFT'ed the Base 7b 0.2 and it will summarize 29k Mistral tokens of text without issue. This one wil not.

@Kearm I think the model loader is wrongly thinking all Mistral uses sliding-window when v0.2 explicitly says it doe not support slidding window attention. Check with the model loader framework. I could be wrong on this.

@Qubitium I was using raw most recent transformers over 6 GPU's 2 3090 Ti and 4 3090's. My attempt to replicate again fails, but on my SFT of it works perfectly with the new base v0.2 model.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment