Example of how to use the models

by milyiyo - opened Mar 13, 2023

Discussion

milyiyo

Mar 13, 2023

Hi @maderix :)

Do you have any Colab showing how to use these models or a Space in HuggingFace?

maderix

Owner Mar 14, 2023

You can follow the steps here: https://github.com/qwopqwop200/GPTQ-for-LLaMa

Sirenfal

Mar 14, 2023

•

edited Mar 14, 2023

Can you please also include config.json for the checkpoints?

wassname

Mar 15, 2023

https://github.com/amrrs/llama-4bit-colab/blob/main/LLaMA_4_bit_on_Google_Colab.ipynb

titan087

Mar 15, 2023

Can you please also include config.json for the checkpoints?

You can use the config from the already converted models. I am using the 30b config with the 4bit model and its working fine

cercatrova

Mar 18, 2023

Does this work with llama.cpp?

maderix

Owner Mar 18, 2023

llama.cpp uses a different format, it has a script for converting the original pth checkpoints to ggml format. I doubt these models will work as they are already quantized.

cercatrova

Mar 18, 2023

Makes sense, do you know how much RAM it takes to quantize the 65B model to 4 bits, for use in llama.cpp using their quantization script?

Rodzite

Mar 20, 2023

I've heard it takes over 100 GB of RAM/swap to quantize the 65B model.

maderix

Owner Mar 21, 2023

Yeah it took around 126GB in my case, not sure if that's a bug in the GPTQ conversion script.

infinitylogesh

Mar 22, 2023

https://github.com/amrrs/llama-4bit-colab/blob/main/LLaMA_4_bit_on_Google_Colab.ipynb

Following this notebook and using the converted weights from here is giving me this error ? Any one else facing this issue, any help on this would be great ! Thanks

size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
    size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
    size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
    size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008]).
    size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([4096, 1]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
....

baffo32

Mar 24, 2023

•

edited Mar 24, 2023

llama.cpp uses a different format, it has a script for converting the original pth checkpoints to ggml format. I doubt these models will work as they are already quantized.

there's little reason for quantization to affect the format [edit: i see this may not be true; it is notable that the pytorch format is loaded similarly]

https://github.com/amrrs/llama-4bit-colab/blob/main/LLaMA_4_bit_on_Google_Colab.ipynb

Following this notebook and using the converted weights from here is giving me this error ? Any one else facing this issue, any help on this would be great ! Thanks

it looks like one of the two projects you are combining is expecting the weights transposed. you may need to load them manually and replace each one with the result of calling .Tto get them to load. The maintainer of the colab can likely help more, as it is a much simpler change in code than data.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment