Edit model card

Sao10K/MN-12B-Lyra-v1

About AWQ

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.

AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.

It is supported by:

Downloads last month
19
Safetensors
Model size
3.09B params
Tensor type
I32
·
FP16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for cmshin96/MN-12B-Lyra-v1-awq

Quantized
this model