FlatDolphinMaid-8x7B 4bpw

You probably want the 3.5bpw version. It just fits in 24gb of vram at half context (16384).

If you really want the larger context 3bpw should do it but you are probably better of with the gguf version with higher quants.

I did make a 4bpw, it might work in a headless or multigpu setup.

Make sure you enable 8bit cache.

### Instruction:
{system prompt}

### Input:
{input}

### Response:
{reply}

Kooten on discord.