|
Generated using autoawq: `pip install git+https://github.com/casper-hansen/AutoAWQ.git@f0321eedca887c12680553fc561d176b03b1b9a5 flash_attn` |
|
|
|
Following code used for generation: |
|
|
|
```python |
|
from awq import AutoAWQForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
model_path = 'models/Phi-3-medium-128k-instruct' |
|
quant_path = 'models/Phi-3-medium-128k-instruct-awq' |
|
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } |
|
|
|
# Load model |
|
model = AutoAWQForCausalLM.from_pretrained(model_path, **{"device_map": "auto"}) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
|
|
|
# Quantize |
|
model.quantize(tokenizer, quant_config=quant_config) |
|
|
|
# Save quantized model |
|
model.save_quantized(quant_path) |
|
tokenizer.save_pretrained(quant_path) |
|
``` |
|
|
|
Original model here: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct |
|
|
|
--- |
|
license: mit |
|
--- |
|
|