Hello, can we talk by email or something similar?
Hello, can we talk by email or something similar?
Sure, contact me at yohan680919@163.com
Just in case you have not received my replay:
Let me briefly setup the context here:
1,
The Yhyu13/oasst-rlhf-2-llama-30b-7k-steps-gptq-4bit was made compatible with No actor order so that it is compatible with Ooba's GPTQ fork which lag behind w/o feature support for actor order
In practice, you can load the gptq-4bit model with either Ooba's fork or with the latest cuda branch of GPTQ-for-LLAMA (qwopqwop200/GPTQ-for-LLaMa at cuda (github.com))
But in general, Ooba's fork is more performant for some reasons I don't entirely understand.
2,
The next step, you will need to copy your GPTQ repo (either Ooba' fork or the qwopa's cuda branch or your own modification) into ooba'a textgen-webui's repository folder, the "repositories" folder is not created by default, you need to mkdir it.
SO it's like "/yourpath/text-generation-webui/repositories/GPTQ-for-LLaMa",
3,
You need to install the GPTQ-for-LLaMa's py whl, it is generated from Ooba's docker script during its setup. But you can check out my fork of GTPQ-for-LLaMA (yhyu13/GPTQ-for-LLaMa at cuda_ooba (github.com)) where
I created a docker_build_whl.sh script to help build the whl, you need to modify TORCH_ARCH_LIST and choose your own CUDA image so to match your rig.
The whl is produced under result/ folder, you need to install it in order to load GPTQ quantized llama models
In summary, in order to use Ooba's textgen webui with GPTQ-for-LLaMa, you need to copy the GPTQ repo under repositories folder, and install the correct, locally built GPTQ whl.
4,
The final step is to load the gptq model with correct arg. you can checkout my fork of textgen-webui here (text-generation-webui/.env.example at dev 路 yhyu13/text-generation-webui 路 GitHub)
I provided the argument that I used locally.
Hello, if I get it, I'm in it. In the end, I'm changing to ubuntu on my server in windows, I'm not capable.
In these days I'm sure I can :)
Thanks for all the explanation.