Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

App Files Files Community

189

imatrix support

#80

by SixOpen - opened Jun 3, 2024

base: refs/heads/main

←

from: refs/pr/80

Discussion Files changed

+2240

-31

SixOpen

Jun 3, 2024

No description provided.

SixOpen

Jun 4, 2024

continued from https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/78
Sadly no clue as to why I have no perms to push, whoami says the huggingface-cli login is valid.

The diffs are naturally still off due to uploading through huggingface_cli upload ggml-org/gguf-my-repo . . --repo-type space --revision refs/pr/80 instead after the denies. Looking forward to a solution regarding the auth, it's still in draft mode so all good until finding a way :)

Dampfinchen

Jun 4, 2024

Will we able to submit our own .txt file for Imatrix generation? That would be really cool. I hope this gets merged soon, it's a game changer.

SixOpen

Jun 4, 2024

Of course! The fallback file is there solely for less familiar users that would try to quantize without providing their own :)

reach-vb

ggml.ai org Jun 4, 2024

•

edited Jun 4, 2024

@SixOpen - can you try creating a pull request like this: https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-advanced-usage - this way you should have the correct diff.

Otherwise maybe you can open a PR through the UI :/

Imatrix support1c7f9546

SixOpen

Jun 4, 2024

All ready! Oddly, it still didn't push after the hf-cli login but remote set-url origin with username and token did! Glad I didn't have to clutter you with that many separate PRs and thanks for the patience 😆

reach-vb

ggml.ai org Jun 5, 2024

Brilliant! Reviewing it now!

reach-vb changed pull request status to open Jun 5, 2024

reach-vb

ggml.ai org Jun 5, 2024

This generally looks good to me! Thanks for keeping it clean! Would really like it if @ggerganov can give it a review too!

ggerganov

ggml.ai org Jun 6, 2024

Looks like we are calling make twice: one time in Dockerfile ENTRYPOINT and one more time in start.sh. Maybe it is better to just call it in Dockerfile like this:

ENTRYPOINT ["/bin/bash", "-c", "cd llama.cpp && LLAMA_CUDA=1 make -j quantize gguf-split imatrix && cd .. && /bin/sh start.sh"]

And simplify start.sh to just:

python app.py

In app.py, is it necessary to compile again? If not, then generate_importance_matrix can be simplified
Since the imatrix computation can take a lot of time if the training data is too big, we can put a time limit for the imatrix command - let's say 1 minute. If the process does not finish within this time limit, it gets killed and we use whatever imatrix.dat has been generated last (the imatrix tool periodically outputs the current result to imatrix.dat, see the --output-frequency CLI argument)

SixOpen

Jun 6, 2024

Great calls :) The time limit is definitely something we should have, will add that in a bit! Looks that while stashing start.sh remained on the version prior to entrypoint tweaks, and some LFS shenanigans might have affected the txt as well but I'll update the branch to take care of all of that 😄 along with the superfluous compile in app.py

Imatrix3305eec5

Imatrix5082a24b

reach-vb

ggml.ai org Jun 7, 2024

•

edited Jun 7, 2024

Thanks @ggerganov for the review! and thanks @SixOpen for updating the PR.

Small comment - let's keep the build process in the start.sh. This is because spaces sometimes build the Dockerfile in a different environment and the final space separately.

If the build happens during start.sh, then we make sure that the build is correct as per the hardware assigned to the space (this also makes it easy for people to duplicate this space).

Question: how are we ensuring the imatrix process goes on only for a minute?

EDIT: Nevermind saw the signal code LGTM.

ggerganov

ggml.ai org Jun 7, 2024

Agree to move the build inside start.sh. Btw the 1 minute timeout was an example - I'm not sure what number would make sense, so feel free to experiment if it is too-short or too-long

Dampfinchen

Jun 7, 2024

Kalomaze's group merged is a very popular imatrix dataset: https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384

I suggest running that on a large model and see how long it takes, then add a few minutes in case people want to add something to it.

reach-vb

ggml.ai org Jun 7, 2024

Oh wow! this dataset looks great! Can we please use this as the default in our case @SixOpen ? 🤗

SixOpen

Jun 7, 2024

Of course! Good to know about spaces :) will update soon covering all above

reach-vb

ggml.ai org Jun 10, 2024

Nice thanks @SixOpen - let me know when you've made the update I can then re-review and merge this!
I'll also start a discussion to highlight your contribution to the repo too! ❤️

Dampfinchen

Jun 10, 2024

Very much looking forward to this PR getting merged.

Imatrix46483c1f

SixOpen

Jun 10, 2024

@reach-vb Thanks for the wait, let me know if I should add anything else 🤗

reach-vb

ggml.ai org Jun 11, 2024

Lovely! Looks good to me! 🚀

reach-vb changed pull request status to merged Jun 11, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment