Thanks

#1
by MB7977 - opened

Thank you for this. I'm curious to see if an exl2 version might cure some of the issues I've had with other quants of it. Just a heads up that you're missing the config.json. I used the one from The Bloke's repo but needed to fix the max_position_embeddings as it's incorrectly set to 2048. When quanting, exllamav2 threw up an illegal memory access/shaping error before I changed it to 4096 (I am quantizing with that sequence length).

Thanks! I didn't upload it since I was doing the same test and it failed me on 2048 ctx. Just uploaded the correct config.json.

Sign up or log in to comment