Edit model card

Locutusque/llama-3-neural-chat-v2.2-8B AWQ

image/png

Model Details

I fine-tuned llama-3 8B on an approach similar to Intel's neural chat language model. I have slightly modified the data sources so it is stronger in coding, math, and writing. I use both SFT and DPO-Positive. DPO-Positive dramatically improves performance over DPO.

About AWQ

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.

AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.

It is supported by:

Downloads last month
25
Safetensors
Model size
1.98B params
Tensor type
I32
·
FP16
·
Inference API
Inference API (serverless) has been turned off for this model.

Collection including solidrust/llama-3-neural-chat-v2.2-8B-AWQ