Update README.md

184694f verified 21 days ago

4.51 kB

	---
	license: mit
	language:
	- en
	base_model:
	- Qwen/QwQ-32B-Preview
	new_version: Qwen/QwQ-32B-Preview
	---
	## Evaluation Results

	### Evaluation Metrics

	\| Groups \| Version \| Filter \| n-shot \| Metric \| Direction \| Value \| Stderr \|
	\|----------------------\|:-----------:\|:----------:\|:----------:\|:----------:\|:-------------:\|----------:\|-----------:\|
	\| mmlu \| 2 \| none \| - \| acc \| ↑ \| 0.8034 \| ±0.0032 \|
	\|     humanities \| 2 \| none \| - \| acc \| ↑ \| 0.7275 \| ±0.0062 \|
	\|     other \| 2 \| none \| - \| acc \| ↑ \| 0.8323 \| ±0.0064 \|
	\|     social sciences\| 2 \| none \| - \| acc \| ↑ \| 0.8856 \| ±0.0056 \|
	\|     stem \| 2 \| none \| - \| acc \| ↑ \| 0.8081 \| ±0.0068 \|

	### Description

	- mmlu: Overall accuracy across multiple domains.
	- humanities: Accuracy in humanities-related tasks.
	- other: Accuracy in other unspecified domains.
	- social sciences: Accuracy in social sciences-related tasks.
	- stem: Accuracy in STEM (Science, Technology, Engineering, Mathematics) related tasks.


	# QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit

	![License](https://img.shields.io/badge/license-MIT-blue)
	![Stars](https://img.shields.io/badge/stars-0-lightgrey.svg)
	![Downloads](https://img.shields.io/badge/downloads-0-lightgrey.svg)

	## Model Description

	QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit is a quantized version of the QwQ-32B-Preview model, optimized for efficient inference without significant loss in performance. This model employs AutoRound for quantization, utilizing the GPTQ (Generative Pre-trained Transformer Quantization) method with symmetric 4-bit quantization. The quantization process reduces the model size and computational requirements, making it more suitable for deployment in resource-constrained environments.

	### Features

	- Quantization Method: AutoRound with GPTQ
	- Bit Precision: 4-bit symmetric quantization
	- Group Size: 128
	- Efficiency: Optimized for low GPU memory usage
	- Compatibility: Compatible with Hugging Face's Transformers library

	## Intended Uses

	- Natural Language Processing (NLP): Suitable for tasks such as text generation, translation, summarization, and question-answering.
	- Deployment in Resource-Constrained Environments: Ideal for applications requiring efficient inference on devices with limited computational resources.
	- Research and Development: Useful for researchers exploring model compression and quantization techniques.

	Note: This model is intended for non-commercial research and experimentation purposes. Users should evaluate the model's performance in their specific use cases before deployment.

	## Limitations

	- Performance Trade-off: While quantization significantly reduces model size and increases inference speed, it may introduce slight degradations in performance compared to the full-precision version.
	- Compatibility: The quantized model may not be compatible with all libraries and frameworks. Ensure compatibility with your deployment environment.
	- Bias and Fairness: As with all language models, this model may inherit biases present in the training data. Users should be cautious and perform thorough evaluations before deploying in sensitive applications.

	## Usage Example:

	Here's a simple example of how to load and use the quantized model with Hugging Face's Transformers library:

	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit")

	# Load the quantized model
	model = AutoModelForCausalLM.from_pretrained(
	"Satwik11/QwQ-32B-Preview-quantized-autoround-GPTQ-sym-4bit",
	load_in_4bit=True,
	device_map="auto"
	)

	# Prepare input
	input_text = "Once upon a time"
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

	# Generate text
	outputs = model.generate(**inputs, max_length=50)
	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(generated_text)


	# Output:

	Once upon a time, in a land far away, there lived a...