File size: 2,841 Bytes
a54db43
8825ea3
 
 
 
 
 
a54db43
8825ea3
 
 
 
 
 
 
 
 
 
 
a54db43
8825ea3
 
 
 
 
 
 
3d96af0
ada1508
 
 
 
 
 
 
 
 
 
8825ea3
 
 
 
 
 
c1d30f7
8825ea3
c1d30f7
 
a38a900
5b54ae7
c1c4ef0
cf9fed4
c1c4ef0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5961f1
c1c4ef0
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
datasets:
- c-s-ale/alpaca-gpt4-data
- Open-Orca/OpenOrca
- Intel/orca_dpo_pairs
- allenai/ultrafeedback_binarized_cleaned
- HuggingFaceH4/no_robots
license: cc-by-nc-4.0
language:
  - en
library_name: ExLlamaV2
pipeline_tag: text-generation
tags:
  - Mistral
  - SOLAR
  - Quantized Model
  - exl2
base_model:
  - rishiraj/meow
---

# exl2 quants for meow

This repository includes the quantized models for the [meow](https://huggingface.co/rishiraj/meow) model by [Rishiraj Acharya](https://huggingface.co/rishiraj). meow is a fine-tune of [SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) with the [no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) dataset.

## Current models

| exl2 BPW | Model Branch | Model Size | Minimum VRAM (4096 Context, fp16 cache) |
|-|-|-|-|
| 2-Bit | main | 3.28 GB | 6GB GPU |
| 4-Bit | 4bit | 5.61 GB | 8GB GPU |
| 5-Bit | 5bit | 6.92 GB | 10GB GPU, 8GB with swap |
| 6-Bit | 6bit | 8.23 GB | 10GB GPU |
| 8-Bit | 8bit | 10.84 GB | 12GB GPU |

### Note

Using a 12GB Nvidia GeForce RTX 3060 I got on average around 20 tokens per second on the 8-bit quant in full 4096 context.

## Where to use

There are a couple places you can use an exl2 model, here are a few:

- [tabbyAPI](https://github.com/theroyallab/tabbyAPI)
- [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine)
- [ExUI](https://github.com/turboderp/exui)
- [oobabooga's Text Gen Webui](https://github.com/oobabooga/text-generation-webui)
  - When using the downloader, make sure to format like this: Anthonyg5005/rishiraj-meow-10.7B-exl2**\:QuantBranch**
- [KoboldAI](https://github.com/henk717/KoboldAI) (Clone repo, don't use snapshot)

# How to download:

### oobabooga's downloader

use something like [download-model.py](https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py) to download with python requests.\
Install requirements:

```shell
pip install requests tqdm
```

Example for downloading 5bpw:

```shell
python download-model.py Anthonyg5005/rishiraj-meow-10.7B-exl2:5bit
```

### huggingface-cli

You may also use huggingface-cli\
To install it, install python hf-hub

```shell
pip install huggingface-hub
```

Example for 5bpw:

```shell
huggingface-cli download Anthonyg5005/rishiraj-meow-10.7B-exl2 --local-dir rishiraj-meow-10.7B-exl2-5bpw --revision 5bit
```
### Git LFS (not recommended)

I would recommend the http downloaders over using git, they can resume downloads if failed and are much easier to work with.\
Make sure to have git and git LFS installed.\
Example for 5bpw download with git:

Have LFS file skip disabled
```shell
# windows
set GIT_LFS_SKIP_SMUDGE=0
# linux
export GIT_LFS_SKIP_SMUDGE=0
```

Clone repo branch
```shell
git clone https://huggingface.co/Anthonyg5005/rishiraj-meow-10.7B-exl2 -b 5bit
```