TheBloke/Llama-2-7B-Chat-GPTQ · Please make this model quantised GPTQ

Aug 1, 2023

Hello, sorry for offtopic but I'm new here and don't know other way to message you. You are doin a very great job for comunity providing so much models quantised for allowing ppls with low computing resources to run models on their computers. Thank you very much for that! Please quantise these models in GPTQ 4 bits (it will be ideally to run on laptops with Nvidia 4050 6gb vram or higher ) togethercomputer/LLaMA-2-7B-32K ; cerebras/btlm-3b-8k-base ; James-WYang/BigTranslate . Thank you very much if you can help! And thanks anyway for the great job you are doing for all !

TheBloke

Owner Aug 1, 2023

You're welcome!

I've already done BigTranslate: https://huggingface.co/TheBloke/BigTrans-13B-GPTQ

I will look at llama 2 32k later today.

I don't think I can do btlm as it's a new model format not supported by GPTQ or GGML yet.

AiModelsMarket

Aug 1, 2023

•

edited Aug 1, 2023

I can run btlm on Oobabooga with Transformers and load in 4 bit , with a reduced max_new_tokens (model reply size) . This make me belive it can be quantised too with GPTQ in 4 bits ( a noob oppinion you are the expert here :) ) . Anyway to reduce the BigTrans-13B model size further to fit in a 6gb vram ? Your quantised version in GPTQ have 7,9 gb . Is it posible to quatise it in 3bit, 2 bit or anything that can squueze it more si that it fit in a videocard with 6gb videoram ?