Add GPTQ-loading with πŸ€— transformers lib

#3
by marksverdhei - opened

This pull request converts the model files to make it runnable with the πŸ€— transformerslibrary directly, by converting the files to the same format as TheBloke models.
This includes:

  • Adding the quantization config to config.json
  • Adding metadata to model.safetensors: {"format": "pt", "quantized_by": "RuterNorway"}

Then, given that the librariestransformers, optimum, auto-gptq
you should be able to load it in like this:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RuterNorway/Llama-2-13b-chat-norwegian-GPTQ")
model = AutoModelForCausalLM.from_pretrained("RuterNorway/Llama-2-13b-chat-norwegian-GPTQ")
marksverdhei changed pull request title from GPTQ-loading with πŸ€— transformers lib to Add GPTQ-loading with πŸ€— transformers lib

Great work. Tested and works as expected.

RuterNorway changed pull request status to open
RuterNorway changed pull request status to merged

Sign up or log in to comment