Failed to quantize
I keep trying to quantize all of your models, and every single minitron 4B based on a molestation of the llama 3.1 base model fails with this error:
FourOhFour-Maelstrom_4B
Using CUDA. Available GPU memory: 23.60 GB
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Quantization failed: Trying to set a tensor of shape torch.Size([1024, 3072]) in "weight" (which has shape torch.Size([768, 3072])), this looks incorrect.
Quantization failed: Trying to set a tensor of shape torch.Size([1024, 3072]) in "weight" (which has shape torch.Size([768, 3072])), this looks incorrect.
An error occurred during the quantization process: Trying to set a tensor of shape torch.Size([1024, 3072]) in "weight" (which has shape torch.Size([768, 3072])), this looks incorrect.
Have you tried updating transformers to 4.45.0?
pip install -U transformers@git+https://github.com/huggingface/transformers.git
Well, this is the crux of my problem. Advancing my transformers version (requires a force upgrade that breaks any depends.) needs me to change my torch version (cuda), and even when I recompile the AutoAWQ_kernel, the model that I make gives all "!!!!' as the response.
so far, to have a stable quantization pipeline, the best torch I can use is 2.4.0, and transformers 4.44.2
when I just tell the depends to go away and force transformers 4.45.x, like I mentioned earlier, I just get a whole lot of !!!!
as the response.
not sure how to proceed. I don't think it is an actual problem with your model, so I'm sorry for venting my frustrations here.
My best suggestion is to try another type of quant. Both Exllama and GGUF have been proven to work by members of the Anthracite organization.
I understand. I run the SolidRusT (SRT) organization, and we just do AWQ: https://huggingface.co/solidrust
I'll try again at another time. thank-you for your responses.