parameters need to change if run one GPU

#1
by greagain - opened

I like the idea of running GGUF models, could you please give the changes needed to run the code on GPU? I think need to change the: CMAKE_ARGS? Also, if I already have the GGUF model downloaded and want to run this on my local computer, how should I change the code? I think I just change the: model_path=model_name?

No description provided.
Cran-May changed discussion status to closed
Cran-May changed discussion status to open

https://github.com/abetlen/llama-cpp-python
will be a reference page
Attention the section "Installation with Hardware Acceleration", maybe you need to install some dependency(some tools for cuda?I guess.) first.
On local computer, some interesting projects may benefit a lot. https://github.com/oobabooga/text-generation-webui
is a good idea. Since I haven’t tested it, maybe you’ll find better tools on GitHub.
Most important thing is that you can download the model and use it(change this part).
model = Llama(
model_path=model_name,
n_ctx=2000,
n_parts=1,
)
Model_name should be replaced, or
model_name= "./example/7B/just-example-yes-model.gguf"
And, delete some codes about snapshot_download.

Since English is just my second language, I may not express myself clearly enough. I hope the above is helpful to you.
看上去你需要中文模型的话,OpenBuddy也有很多不错的项目,几小时前刚刚发布的
https://huggingface.co/OpenBuddy/openbuddy-zephyr-7b-v14.1
就是一个不错的项目,纯cpu上也能有很快的速度
以下是TheBloke的量化版本,我推荐“ openbuddy-zephyr-7b-v14.1.Q4_K_M.gguf”
通常来说Q4_K_M是最具性价比的量化方式,推理速度仅微幅落后于Q4_0,Q2_K与Q4_K_S,但明显快于Q3和Q5、Q6、Q8,而损失也较小,处于可接受范畴,内存占用也明显小于Q5、Q6。
https://huggingface.co/TheBloke/openbuddy-zephyr-7B-v14.1-GGUF

Thanks a lot 👍I did try oobabooga text-generation-webui, it's too AIO for me.

greagain changed discussion status to closed

Sign up or log in to comment