It's a bit difficult to deploy the 70B model for verification, so let's keep an eye on how things develop

#4
by wawoshashi - opened

个人部署70B模型来做验证,有点困难, 关注事态发展

Try this quantized version https://huggingface.co/TheBloke/Xwin-LM-70B-V0.1-GGUF which only needs a 48G Vram card, or 40GB RAM cpu only.

You can try it now with llama.cpp

There is also 7B GPTQ Version https://huggingface.co/TheBloke/Xwin-LM-7B-V0.1-GPTQ only need 6G VRAM

I can run 70B quantized GGUF model (Q3_K - Small and offloaded 60/83 layers to GPU ) on 3090 via llama.cpp.

Sign up or log in to comment