|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/QwQ-32B-Preview |
|
new_version: Qwen/QwQ-32B-Preview |
|
--- |
|
## Evaluation Results |
|
|
|
### Evaluation Metrics |
|
|
|
| **Groups** | **Version** | **Filter** | **n-shot** | **Metric** | **Direction** | **Value** | **Stderr** | |
|
|----------------------|:-----------:|:----------:|:----------:|:----------:|:-------------:|----------:|-----------:| |
|
| **mmlu** | 2 | none | - | acc | ↑ | 0.8034 | ±0.0032 | |
|
| **humanities** | 2 | none | - | acc | ↑ | 0.7275 | ±0.0062 | |
|
| **other** | 2 | none | - | acc | ↑ | 0.8323 | ±0.0064 | |
|
| **social sciences**| 2 | none | - | acc | ↑ | 0.8856 | ±0.0056 | |
|
| **stem** | 2 | none | - | acc | ↑ | 0.8081 | ±0.0068 | |
|
|
|
### Description |
|
|
|
- **mmlu**: Overall accuracy across multiple domains. |
|
- **humanities**: Accuracy in humanities-related tasks. |
|
- **other**: Accuracy in other unspecified domains. |
|
- **social sciences**: Accuracy in social sciences-related tasks. |
|
- **stem**: Accuracy in STEM (Science, Technology, Engineering, Mathematics) related tasks. |
|
|
|
|
|
# QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit |
|
|
|
![License](https://img.shields.io/badge/license-MIT-blue) |
|
![Stars](https://img.shields.io/badge/stars-0-lightgrey.svg) |
|
![Downloads](https://img.shields.io/badge/downloads-0-lightgrey.svg) |
|
|
|
## Model Description |
|
|
|
**QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit** is a quantized version of the QwQ-32B-Preview model, optimized for efficient inference without significant loss in performance. This model employs **AutoRound** for quantization, utilizing the GPTQ (Generative Pre-trained Transformer Quantization) method with symmetric 4-bit quantization. The quantization process reduces the model size and computational requirements, making it more suitable for deployment in resource-constrained environments. |
|
|
|
### Features |
|
|
|
- **Quantization Method**: AutoRound with GPTQ |
|
- **Bit Precision**: 4-bit symmetric quantization |
|
- **Group Size**: 128 |
|
- **Efficiency**: Optimized for low GPU memory usage |
|
- **Compatibility**: Compatible with Hugging Face's Transformers library |
|
|
|
## Intended Uses |
|
|
|
- **Natural Language Processing (NLP)**: Suitable for tasks such as text generation, translation, summarization, and question-answering. |
|
- **Deployment in Resource-Constrained Environments**: Ideal for applications requiring efficient inference on devices with limited computational resources. |
|
- **Research and Development**: Useful for researchers exploring model compression and quantization techniques. |
|
|
|
**Note**: This model is intended for non-commercial research and experimentation purposes. Users should evaluate the model's performance in their specific use cases before deployment. |
|
|
|
## Limitations |
|
|
|
- **Performance Trade-off**: While quantization significantly reduces model size and increases inference speed, it may introduce slight degradations in performance compared to the full-precision version. |
|
- **Compatibility**: The quantized model may not be compatible with all libraries and frameworks. Ensure compatibility with your deployment environment. |
|
- **Bias and Fairness**: As with all language models, this model may inherit biases present in the training data. Users should be cautious and perform thorough evaluations before deploying in sensitive applications. |
|
|
|
## Usage Example: |
|
|
|
Here's a simple example of how to load and use the quantized model with Hugging Face's Transformers library: |
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load the tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit") |
|
|
|
# Load the quantized model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit", |
|
load_in_4bit=True, |
|
device_map="auto" |
|
) |
|
|
|
# Prepare input |
|
input_text = "Once upon a time" |
|
inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
|
|
|
# Generate text |
|
outputs = model.generate(**inputs, max_length=50) |
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(generated_text) |
|
|
|
|
|
# Output: |
|
|
|
Once upon a time, in a land far away, there lived a... |
|
|
|
|