Spaces:
Running
Running
Split/shard support
#38
by
phymbert
- opened
Will it be possible to support model sharding recently introduced in llama.cpp ?
+1
Heya! @phymbert - definitely yes, do you mind pointing me to the relevant snippet?
We're currently just quantizing and uploading to the Hub: https://huggingface.co/spaces/ggml-org/gguf-my-repo/blob/main/app.py#L63
Happy for suggestions!
Hi @reach-vb ,
I wrote a tutorial here: https://github.com/ggerganov/llama.cpp/discussions/6404
The --split-max-size
has been fixed recently.
Please ping if you need additional explanations.
Thanks