exl2 quants for meow
This repository includes the quantized models for the meow model by Rishiraj Acharya. meow is a fine-tune of SOLAR-10.7B-Instruct-v1.0 with the no_robots dataset.
Current models
exl2 BPW | Model Branch | Model Size | Minimum VRAM (4096 Context, fp16 cache) |
---|---|---|---|
2-Bit | main | 3.28 GB | 6GB GPU |
4-Bit | 4bit | 5.61 GB | 8GB GPU |
5-Bit | 5bit | 6.92 GB | 10GB GPU, 8GB with swap |
6-Bit | 6bit | 8.23 GB | 10GB GPU |
8-Bit | 8bit | 10.84 GB | 12GB GPU |
Note
Using a 12GB Nvidia GeForce RTX 3060 I got on average around 20 tokens per second on the 8-bit quant in full 4096 context.
Where to use
There are a couple places you can use an exl2 model, here are a few:
- tabbyAPI
- Aphrodite Engine
- ExUI
- oobabooga's Text Gen Webui
- When using the downloader, make sure to format like this: Anthonyg5005/rishiraj-meow-10.7B-exl2:QuantBranch
- KoboldAI (Clone repo, don't use snapshot)
WARNING
Model cannot be used commercially due to the Alpaca dataset license. Only use this model for research purposes or personal use.
- Downloads last month
- 7