Requirements
Install the following libraries
pip install transformers accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed tiktoken autoawq
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer,AwqConfig, AutoConfig
from transformers.generation import GenerationConfig
from awq import AutoAWQForCausalLM
# quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version":"GEMM"}
#quantization_config = AwqConfig(
# bits=quant_config["w_bit"],
# group_size=quant_config["q_group_size"],
# zero_point=quant_config["zero_point"],
# version=quant_config["version"].lower(),
#).to_dict()
tokenizer_abel = AutoTokenizer.from_pretrained("AbelKidane/abelqwen4", trust_remote_code=True)
model_abel = AutoAWQForCausalLM.from_quantized("AbelKidane/abelqwen4", fuse_layers=True, safetensors=True, trust_remote_code=True)
text = "The capital city of "
inputs = tokenizer_abel(text, return_tensors="pt").to(0)
out = model_abel.generate(**inputs, max_new_tokens=5)
print(tokenizer_abel.decode(out[0], skip_special_tokens=True))
- Downloads last month
- 72
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support model that require custom code execution.