--- language: - en - fr - de - es - it - pt - zh - ja - ru - ko license: other license_name: mrl inference: false license_link: https://mistral.ai/licenses/MRL-0.1.md base_model: - anthracite-org/magnum-v4-123b --- # Magnum-v4-123b HQQ This repo contains magnum-v4-123b quantized to 4-bit precision using [HQQ](https://github.com/mobiusml/hqq/). HQQ provides a similar level of precision to AWQ at 4-bit, but with no need for calibration. This quant was generated using 8xA40s within only 10 minutes. ```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig model_path = "anthracite-org/magnum-v4-123b" quant_config = HqqConfig(nbits=4, group_size=128, axis=1) model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, cache_dir='.', device_map="cuda:0", quantization_config=quant_config, low_cpu_mem_usage=True) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) output_path = "magnum-v4-123b-hqq-4bit" model.save_pretrained(output_path) tokenizer.save_pretrained(output_path) ``` ## Inference You can perform inference directly with transformers, or using [aphrodite](https://github.com/PygmalionAI/aphrodite-engine): ```sh pip install aphrodite-engine aphrodite run alpindale/magnum-v4-123b-hqq-4bit -tp 2 ```