Need example of model usage

#4
by LeMoussel - opened

Can you give an example of using this LLM?
Is it the same use with transformers as meta-llama/Meta-Llama-3.1-70B-Instruct ?

IST Austria Distributed Algorithms and Systems Lab org

Yes, it is the same.

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
import transformers
import torch
force_download=True
quantized_model = AutoModelForCausalLM.from_pretrained(
"ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf", device_map="auto", trust_remote_code=True, torch_dtype="auto",
).cuda()
tokenizer = AutoTokenizer.from_pretrained("ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf")
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
inputs = tokenizer(["An increasing sequence: one,"], return_tensors="pt")["input_ids"].cuda()

streamer = TextStreamer(tokenizer)
_ = quantized_model.generate(inputs, streamer=streamer, max_new_tokens=120)

Sign up or log in to comment