MISHANM/Qwen-QwQ-32B-fp8

Introducing the fp8 quantized version of Qwen/QwQ-32B, an extraordinary model meticulously crafted to deliver exceptional performance on supported hardware, striking a perfect balance between speed and accuracy. This model is designed not only for optimal computational efficiency but also possesses advanced capabilities in thinking and reasoning, enabling it to excel in downstream tasks, particularly when tackling complex and challenging problems.

Model Details

  1. Tasks: Causal Language Modeling, Text Generation
  2. Base Model: Qwen/QwQ-32B
  3. Quantization Format: fp8

Device Used

  1. GPUs: 1*AMD Instinct™ MI210 Accelerators

Inference with HuggingFace


import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_path = "MISHANM/Qwen-QwQ-32B-fp8"

model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)

# Function to generate text
def generate_text(prompt, max_length=1000, temperature=0.9):
   # Format the prompt according to the chat template
   messages = [
       {
           "role": "system",
           "content": "Give response to the user query.",
       },
       {"role": "user", "content": prompt}
   ]

   # Apply the chat template
   formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>"

   # Tokenize and generate output
   inputs = tokenizer(formatted_prompt, return_tensors="pt")
   output = model.generate(  # Use model.module for DataParallel
       **inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True
   )
   return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
prompt = """Give a poem on LLM ."""
text = generate_text(prompt)
print(text)


Citation Information

@misc{MISHANM/Qwen-QwQ-32B-fp8,
  author = {Mishan Maurya},
  title = {Introducing fp8 quantized version of Qwen/QwQ-32B},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  
}
Downloads last month
53
Safetensors
Model size
32.8B params
Tensor type
F32
·
FP16
·
I8
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for MISHANM/Qwen-QwQ-32B-fp8

Base model

Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Quantized
(108)
this model