--- license: mit language: - en base_model: - Qwen/QwQ-32B-Preview new_version: Qwen/QwQ-32B-Preview --- ## Evaluation Results ### Evaluation Metrics | **Groups** | **Version** | **Filter** | **n-shot** | **Metric** | **Direction** | **Value** | **Stderr** | |----------------------|:-----------:|:----------:|:----------:|:----------:|:-------------:|----------:|-----------:| | **mmlu** | 2 | none | - | acc | ↑ | 0.8034 | ±0.0032 | |     **humanities** | 2 | none | - | acc | ↑ | 0.7275 | ±0.0062 | |     **other** | 2 | none | - | acc | ↑ | 0.8323 | ±0.0064 | |     **social sciences**| 2 | none | - | acc | ↑ | 0.8856 | ±0.0056 | |     **stem** | 2 | none | - | acc | ↑ | 0.8081 | ±0.0068 | ### Description - **mmlu**: Overall accuracy across multiple domains. - **humanities**: Accuracy in humanities-related tasks. - **other**: Accuracy in other unspecified domains. - **social sciences**: Accuracy in social sciences-related tasks. - **stem**: Accuracy in STEM (Science, Technology, Engineering, Mathematics) related tasks. ### Visualization If supported, the following Mermaid diagram visualizes the accuracy metrics across different groups: ```mermaid bar title Accuracy Metrics by Group x-axis Groups y-axis Accuracy "mmlu" : 0.8034 "humanities" : 0.7275 "other" : 0.8323 "social sciences" : 0.8856 "stem" : 0.8081 # QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit ![License](https://img.shields.io/badge/license-MIT-blue) ![Stars](https://img.shields.io/badge/stars-0-lightgrey.svg) ![Downloads](https://img.shields.io/badge/downloads-0-lightgrey.svg) ## Model Description **QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit** is a quantized version of the QwQ-32B-Preview model, optimized for efficient inference without significant loss in performance. This model employs **AutoRound** for quantization, utilizing the GPTQ (Generative Pre-trained Transformer Quantization) method with symmetric 4-bit quantization. The quantization process reduces the model size and computational requirements, making it more suitable for deployment in resource-constrained environments. ### Features - **Quantization Method**: AutoRound with GPTQ - **Bit Precision**: 4-bit symmetric quantization - **Group Size**: 128 - **Efficiency**: Optimized for low GPU memory usage - **Compatibility**: Compatible with Hugging Face's Transformers library ## Intended Uses - **Natural Language Processing (NLP)**: Suitable for tasks such as text generation, translation, summarization, and question-answering. - **Deployment in Resource-Constrained Environments**: Ideal for applications requiring efficient inference on devices with limited computational resources. - **Research and Development**: Useful for researchers exploring model compression and quantization techniques. **Note**: This model is intended for non-commercial research and experimentation purposes. Users should evaluate the model's performance in their specific use cases before deployment. ## Limitations - **Performance Trade-off**: While quantization significantly reduces model size and increases inference speed, it may introduce slight degradations in performance compared to the full-precision version. - **Compatibility**: The quantized model may not be compatible with all libraries and frameworks. Ensure compatibility with your deployment environment. - **Bias and Fairness**: As with all language models, this model may inherit biases present in the training data. Users should be cautious and perform thorough evaluations before deploying in sensitive applications. ## Usage Example: Here's a simple example of how to load and use the quantized model with Hugging Face's Transformers library: from transformers import AutoModelForCausalLM, AutoTokenizer # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit") # Load the quantized model model = AutoModelForCausalLM.from_pretrained( "Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit", load_in_4bit=True, device_map="auto" ) # Prepare input input_text = "Once upon a time" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) # Generate text outputs = model.generate(**inputs, max_length=50) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) # Output: Once upon a time, in a land far away, there lived a...