unsloth/DeepSeek-V3-GGUF · Advice on running llama-server with Q2_K

My primary high memory workstation is a 44 core 256 GB machine, but I get malloc/oom errors when trying to start llama-server:

./llama-server -m /models/DeepSeek-V3-GGUF/DeepSeek-V3-Q2_K_L/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf -c 0 --cache-type-k q5_0

gml_aligned_malloc: insufficient memory (attempted to allocate 473360.00 MB)
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 496353935360
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
common_init_from_params: failed to create context with model '/media/vmajor/AI2/models/DeepSeek-V3-GGUF/DeepSeek-V3-Q2_K_L/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf'
srv load_model: failed to load model, '/models/DeepSeek-V3-GGUF/DeepSeek-V3-Q2_K_L/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf'
main: exiting due to model loading error

Does anyone have any advice on how to overcome this? I can of course increase swap but that will cause a significant performance regression. I can try offloading a few layers to the GPU but this system only has an RTX 3060 with 12 GB VRAM so that will not help that much.

unsloth
/

DeepSeek-V3-GGUF

Advice on running llama-server with Q2_K_L quant