No description provided.

continued from https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/78
Sadly no clue as to why I have no perms to push, whoami says the huggingface-cli login is valid.
image.png

image.png

The diffs are naturally still off due to uploading through huggingface_cli upload ggml-org/gguf-my-repo . . --repo-type space --revision refs/pr/80 instead after the denies. Looking forward to a solution regarding the auth, it's still in draft mode so all good until finding a way :)

Will we able to submit our own .txt file for Imatrix generation? That would be really cool. I hope this gets merged soon, it's a game changer.

Of course! The fallback file is there solely for less familiar users that would try to quantize without providing their own :)

@SixOpen - can you try creating a pull request like this: https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-advanced-usage - this way you should have the correct diff.

Otherwise maybe you can open a PR through the UI :/

All ready! Oddly, it still didn't push after the hf-cli login but remote set-url origin with username and token did! Glad I didn't have to clutter you with that many separate PRs and thanks for the patience 😆

ggml.ai org

Brilliant! Reviewing it now!

reach-vb changed pull request status to open
ggml.ai org

This generally looks good to me! Thanks for keeping it clean! Would really like it if @ggerganov can give it a review too!

ggml.ai org
  • Looks like we are calling make twice: one time in Dockerfile ENTRYPOINT and one more time in start.sh. Maybe it is better to just call it in Dockerfile like this:
ENTRYPOINT ["/bin/bash", "-c", "cd llama.cpp && LLAMA_CUDA=1 make -j quantize gguf-split imatrix && cd .. && /bin/sh start.sh"]

And simplify start.sh to just:

python app.py
  • In app.py, is it necessary to compile again? If not, then generate_importance_matrix can be simplified

  • Since the imatrix computation can take a lot of time if the training data is too big, we can put a time limit for the imatrix command - let's say 1 minute. If the process does not finish within this time limit, it gets killed and we use whatever imatrix.dat has been generated last (the imatrix tool periodically outputs the current result to imatrix.dat, see the --output-frequency CLI argument)

Great calls :) The time limit is definitely something we should have, will add that in a bit! Looks that while stashing start.sh remained on the version prior to entrypoint tweaks, and some LFS shenanigans might have affected the txt as well but I'll update the branch to take care of all of that 😄 along with the superfluous compile in app.py

Thanks @ggerganov for the review! and thanks @SixOpen for updating the PR.

Small comment - let's keep the build process in the start.sh. This is because spaces sometimes build the Dockerfile in a different environment and the final space separately.

If the build happens during start.sh, then we make sure that the build is correct as per the hardware assigned to the space (this also makes it easy for people to duplicate this space).

Question: how are we ensuring the imatrix process goes on only for a minute?

EDIT: Nevermind saw the signal code LGTM.

ggml.ai org

Agree to move the build inside start.sh. Btw the 1 minute timeout was an example - I'm not sure what number would make sense, so feel free to experiment if it is too-short or too-long

Kalomaze's group merged is a very popular imatrix dataset: https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384

I suggest running that on a large model and see how long it takes, then add a few minutes in case people want to add something to it.

ggml.ai org

Oh wow! this dataset looks great! Can we please use this as the default in our case @SixOpen ? 🤗

Of course! Good to know about spaces :) will update soon covering all above

ggml.ai org

Nice thanks @SixOpen - let me know when you've made the update I can then re-review and merge this!
I'll also start a discussion to highlight your contribution to the repo too! ❤️

Very much looking forward to this PR getting merged.

@reach-vb Thanks for the wait, let me know if I should add anything else 🤗

ggml.ai org

Lovely! Looks good to me! 🚀

reach-vb changed pull request status to merged

Sign up or log in to comment