Update config.json to accurately reflect the 32k context window.

#73

by Kearm - opened Mar 24, 2024

base: refs/heads/main

←

from: refs/pr/73

Discussion Files changed

-2

Kearm

Mar 24, 2024

Replace the config.json to reflect the lack of a sliding context window, have the context window accurately reflect the 32k context window, and update to the latest transformers version.

Update config.json to accurately reflect the 32k context window.fef543cc

Qubitium

Mar 28, 2024

The original config is correct.

Kearm

Mar 28, 2024

The original config is correct.

Then why can't I use all 32k context? I have SFT'ed the Base 7b 0.2 and it will summarize 29k Mistral tokens of text without issue. This one wil not.

Qubitium

Mar 30, 2024

@Kearm I think the model loader is wrongly thinking all Mistral uses sliding-window when v0.2 explicitly says it doe not support slidding window attention. Check with the model loader framework. I could be wrong on this.

Kearm

Apr 1, 2024

@Qubitium I was using raw most recent transformers over 6 GPU's 2 3090 Ti and 4 3090's. My attempt to replicate again fails, but on my SFT of it works perfectly with the new base v0.2 model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment