FlatDolphinMaid-8x7B 4bpw
Exllama quant of Undi95/FlatDolphinMaid-8x7B
You probably want the 3.5bpw version. It just fits in 24gb of vram at half context (16384).
If you really want the larger context 3bpw should do it but you are probably better of with the gguf version with higher quants.
I did make a 4bpw, it might work in a headless or multigpu setup.
Other BPW's 3.0bpw, 3.5bpw, 4.0bpw
Make sure you enable 8bit cache.
Promt format:
### Instruction:
{system prompt}
### Input:
{input}
### Response:
{reply}
Contact
Kooten on discord.
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.