Will it be possible to support model sharding recently introduced in llama.cpp ?

Heya! @phymbert - definitely yes, do you mind pointing me to the relevant snippet?

We're currently just quantizing and uploading to the Hub: https://huggingface.co/spaces/ggml-org/gguf-my-repo/blob/main/app.py#L63

Happy for suggestions!

Hi @reach-vb ,

I wrote a tutorial here: https://github.com/ggerganov/llama.cpp/discussions/6404

The --split-max-size has been fixed recently.

Please ping if you need additional explanations.


Hello! I have implemented this and will do a PR :) tried to keep the additions minimal to avoid cluttering the interface, so let me know if there's any change I should do to the layout or anything else before merging and will do so gladly.

Closing this as solved! Thanks @SixOpen ❤️

