GGUF Model, Can I split it?
Based on NyxKrage's LLM VRAM calculator
Max Allocated VRAM
Model (unquantized)
r.json())).models.filter(m => !m.id.includes('GGUF') && !m.id.includes('AWQ') && !m.id.includes('GPTQ') && !m.id.includes('exl2'));" :aria-expanded="open" :aria-controls="$id('model-typeahead')" x-model="value" class="flex justify-between items-center gap-2 w-full" />
Context Size
Quantization Size
Q4_K_S
Batch Size
Submit
Model Size (GB)
4.20
Context Size (GB)
6.90
Total Size (GB)
420.69
Layer size (GB)
42.69
Layers offloaded to GPU
42