How to quantize this model using QLoRA ?

#7
by mrhimanshu - opened

I'm trying to convert this model into 4bit but somehow it's falling while getting a response.

ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate
arguments will also show up in this list)

The code is below :

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/falcon-7b"

model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)

text = "Hello my name is"
device = "cuda:0"

inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=60)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Technology Innovation Institute org

See this discussion for a solution πŸ‘

FalconLLM changed discussion status to closed

Sign up or log in to comment