ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16

Sep 18, 2024

Can you give an example of using this LLM?
Is it the same use with transformers as meta-llama/Meta-Llama-3.1-70B-Instruct ?

galqiwi

IST Austria Distributed Algorithms and Systems Lab org Nov 3, 2024

Yes, it is the same.

sdyy

Nov 19, 2024

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
import transformers
import torch
force_download=True
quantized_model = AutoModelForCausalLM.from_pretrained(
"ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf", device_map="auto", trust_remote_code=True, torch_dtype="auto",
).cuda()
tokenizer = AutoTokenizer.from_pretrained("ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf")
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
inputs = tokenizer(["An increasing sequence: one,"], return_tensors="pt")["input_ids"].cuda()

streamer = TextStreamer(tokenizer)
_ = quantized_model.generate(inputs, streamer=streamer, max_new_tokens=120)

ISTA-DASLab
/

Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16

Need example of model usage