metadata
license: mit
language:
- en
base_model:
- Qwen/QwQ-32B-Preview
new_version: Qwen/QwQ-32B-Preview
Evaluation Results
Evaluation Metrics
Groups | Version | Filter | n-shot | Metric | Direction | Value | Stderr |
---|---|---|---|---|---|---|---|
mmlu | 2 | none | - | acc | ↑ | 0.8034 | ±0.0032 |
humanities | 2 | none | - | acc | ↑ | 0.7275 | ±0.0062 |
other | 2 | none | - | acc | ↑ | 0.8323 | ±0.0064 |
social sciences | 2 | none | - | acc | ↑ | 0.8856 | ±0.0056 |
stem | 2 | none | - | acc | ↑ | 0.8081 | ±0.0068 |
Description
- mmlu: Overall accuracy across multiple domains.
- humanities: Accuracy in humanities-related tasks.
- other: Accuracy in other unspecified domains.
- social sciences: Accuracy in social sciences-related tasks.
- stem: Accuracy in STEM (Science, Technology, Engineering, Mathematics) related tasks.
Visualization
If supported, the following Mermaid diagram visualizes the accuracy metrics across different groups:
bar
title Accuracy Metrics by Group
x-axis Groups
y-axis Accuracy
"mmlu" : 0.8034
"humanities" : 0.7275
"other" : 0.8323
"social sciences" : 0.8856
"stem" : 0.8081
# QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit



## Model Description
**QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit** is a quantized version of the QwQ-32B-Preview model, optimized for efficient inference without significant loss in performance. This model employs **AutoRound** for quantization, utilizing the GPTQ (Generative Pre-trained Transformer Quantization) method with symmetric 4-bit quantization. The quantization process reduces the model size and computational requirements, making it more suitable for deployment in resource-constrained environments.
### Features
- **Quantization Method**: AutoRound with GPTQ
- **Bit Precision**: 4-bit symmetric quantization
- **Group Size**: 128
- **Efficiency**: Optimized for low GPU memory usage
- **Compatibility**: Compatible with Hugging Face's Transformers library
## Intended Uses
- **Natural Language Processing (NLP)**: Suitable for tasks such as text generation, translation, summarization, and question-answering.
- **Deployment in Resource-Constrained Environments**: Ideal for applications requiring efficient inference on devices with limited computational resources.
- **Research and Development**: Useful for researchers exploring model compression and quantization techniques.
**Note**: This model is intended for non-commercial research and experimentation purposes. Users should evaluate the model's performance in their specific use cases before deployment.
## Limitations
- **Performance Trade-off**: While quantization significantly reduces model size and increases inference speed, it may introduce slight degradations in performance compared to the full-precision version.
- **Compatibility**: The quantized model may not be compatible with all libraries and frameworks. Ensure compatibility with your deployment environment.
- **Bias and Fairness**: As with all language models, this model may inherit biases present in the training data. Users should be cautious and perform thorough evaluations before deploying in sensitive applications.
## Usage Example:
Here's a simple example of how to load and use the quantized model with Hugging Face's Transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit")
# Load the quantized model
model = AutoModelForCausalLM.from_pretrained(
"Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit",
load_in_4bit=True,
device_map="auto"
)
# Prepare input
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate text
outputs = model.generate(**inputs, max_length=50)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# Output:
Once upon a time, in a land far away, there lived a...