Text Generation
English
ExLlamaV2
SOLAR
Mistral
Quantized Model
exl2
conversational
Edit model card

exl2 quants for meow

This repository includes the quantized models for the meow model by Rishiraj Acharya. meow is a fine-tune of SOLAR-10.7B-Instruct-v1.0 with the no_robots dataset.

Current models

exl2 BPW Model Branch Model Size Minimum VRAM (4096 Context, fp16 cache)
2-Bit main 3.28 GB 6GB GPU
4-Bit 4bit 5.61 GB 8GB GPU
5-Bit 5bit 6.92 GB 10GB GPU, 8GB with swap
6-Bit 6bit 8.23 GB 10GB GPU
8-Bit 8bit 10.84 GB 12GB GPU

Note

Using a 12GB Nvidia GeForce RTX 3060 I got on average around 20 tokens per second on the 8-bit quant in full 4096 context.

Where to use

There are a couple places you can use an exl2 model, here are a few:

WARNING

Model cannot be used commercially due to the Alpaca dataset license. Only use this model for research purposes or personal use.

Downloads last month
7
Inference API
Input a message to start chatting with Anthonyg5005/rishiraj-meow-10.7B-exl2.
Inference API (serverless) does not yet support ExLlamaV2 models for this pipeline type.

Finetuned from

Datasets used to train Anthonyg5005/rishiraj-meow-10.7B-exl2