|
--- |
|
base_model: Qwen/QwQ-32B |
|
--- |
|
|
|
# MISHANM/Qwen-QwQ-32B-fp8 |
|
|
|
Introducing the fp8 quantized version of Qwen/QwQ-32B, an extraordinary model meticulously crafted to deliver exceptional performance on supported hardware, striking a perfect balance between speed and accuracy. This model is designed not only for optimal computational efficiency but also possesses advanced capabilities in thinking and reasoning, enabling it to excel in downstream tasks, particularly when tackling complex and challenging problems. |
|
|
|
|
|
## Model Details |
|
1. Tasks: Causal Language Modeling, Text Generation |
|
2. Base Model: Qwen/QwQ-32B |
|
3. Quantization Format: fp8 |
|
|
|
|
|
|
|
# Device Used |
|
|
|
1. GPUs: 1*AMD Instinct™ MI210 Accelerators |
|
|
|
|
|
|
|
|
|
## Inference with HuggingFace |
|
```python3 |
|
|
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load the fine-tuned model and tokenizer |
|
model_path = "MISHANM/Qwen-QwQ-32B-fp8" |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
# Function to generate text |
|
def generate_text(prompt, max_length=1000, temperature=0.9): |
|
# Format the prompt according to the chat template |
|
messages = [ |
|
{ |
|
"role": "system", |
|
"content": "Give response to the user query.", |
|
}, |
|
{"role": "user", "content": prompt} |
|
] |
|
|
|
# Apply the chat template |
|
formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>" |
|
|
|
# Tokenize and generate output |
|
inputs = tokenizer(formatted_prompt, return_tensors="pt") |
|
output = model.generate( # Use model.module for DataParallel |
|
**inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True |
|
) |
|
return tokenizer.decode(output[0], skip_special_tokens=True) |
|
|
|
# Example usage |
|
prompt = """Give a poem on LLM .""" |
|
text = generate_text(prompt) |
|
print(text) |
|
|
|
|
|
|
|
``` |
|
|
|
## Citation Information |
|
``` |
|
@misc{MISHANM/Qwen-QwQ-32B-fp8, |
|
author = {Mishan Maurya}, |
|
title = {Introducing fp8 quantized version of Qwen/QwQ-32B}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face repository}, |
|
|
|
} |
|
``` |
|
|
|
|
|
|