Example of how to use the models

#3
by milyiyo - opened

Hi @maderix :)

Do you have any Colab showing how to use these models or a Space in HuggingFace?

You can follow the steps here: https://github.com/qwopqwop200/GPTQ-for-LLaMa

Can you please also include config.json for the checkpoints?

Can you please also include config.json for the checkpoints?

You can use the config from the already converted models. I am using the 30b config with the 4bit model and its working fine

Does this work with llama.cpp?

llama.cpp uses a different format, it has a script for converting the original pth checkpoints to ggml format. I doubt these models will work as they are already quantized.

Makes sense, do you know how much RAM it takes to quantize the 65B model to 4 bits, for use in llama.cpp using their quantization script?

I've heard it takes over 100 GB of RAM/swap to quantize the 65B model.

Yeah it took around 126GB in my case, not sure if that's a bug in the GPTQ conversion script.

https://github.com/amrrs/llama-4bit-colab/blob/main/LLaMA_4_bit_on_Google_Colab.ipynb

Following this notebook and using the converted weights from here is giving me this error ? Any one else facing this issue, any help on this would be great ! Thanks

size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
    size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
    size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
    size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
....

llama.cpp uses a different format, it has a script for converting the original pth checkpoints to ggml format. I doubt these models will work as they are already quantized.

there's little reason for quantization to affect the format [edit: i see this may not be true; it is notable that the pytorch format is loaded similarly]

https://github.com/amrrs/llama-4bit-colab/blob/main/LLaMA_4_bit_on_Google_Colab.ipynb

Following this notebook and using the converted weights from here is giving me this error ? Any one else facing this issue, any help on this would be great ! Thanks

it looks like one of the two projects you are combining is expecting the weights transposed. you may need to load them manually and replace each one with the result of calling .Tto get them to load. The maintainer of the colab can likely help more, as it is a much simpler change in code than data.

Sign up or log in to comment