Update README.md
Browse files
README.md
CHANGED
@@ -2,9 +2,39 @@
|
|
2 |
license: cc-by-nc-4.0
|
3 |
datasets:
|
4 |
- victunes/nart-100k-synthetic-buddy-mixed-names
|
|
|
|
|
5 |
---
|
6 |
**GGUF:** https://huggingface.co/victunes/TherapyBeagle-11B-v2-GGUF
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
# TherapyBeagle 11B v2
|
9 |
|
10 |
_Buddy is here for {{user}}._
|
|
|
2 |
license: cc-by-nc-4.0
|
3 |
datasets:
|
4 |
- victunes/nart-100k-synthetic-buddy-mixed-names
|
5 |
+
base_model: victunes/TherapyBeagle-11B-v2
|
6 |
+
inference: false
|
7 |
---
|
8 |
**GGUF:** https://huggingface.co/victunes/TherapyBeagle-11B-v2-GGUF
|
9 |
|
10 |
+
# TherapyBeagle-11B-v2-exl2
|
11 |
+
Original model: [TherapyBeagle-11B-v2](https://huggingface.co/victunes/TherapyBeagle-11B-v2)
|
12 |
+
Model creator: [victunes](https://huggingface.co/victunes)
|
13 |
+
|
14 |
+
## Quants
|
15 |
+
[4bpw h6](https://huggingface.co/cgus/TherapyBeagle-11B-v2-exl2/tree/main)
|
16 |
+
[4.25bpw h6](https://huggingface.co/cgus/TherapyBeagle-11B-v2-exl2/tree/4.25bpw-h6)
|
17 |
+
[4.65bpw h6](https://huggingface.co/cgus/TherapyBeagle-11B-v2-exl2/tree/4.65bpw-h6)
|
18 |
+
[5bpw h6](https://huggingface.co/cgus/TherapyBeagle-11B-v2-exl2/tree/5bpw-h6)
|
19 |
+
[6bpw h6](https://huggingface.co/cgus/TherapyBeagle-11B-v2-exl2/tree/6bpw-h6)
|
20 |
+
[8bpw h8](https://huggingface.co/cgus/TherapyBeagle-11B-v2-exl2/tree/8bpw-h8)
|
21 |
+
|
22 |
+
## Quantization notes
|
23 |
+
Made with exllamav2 0.0.18 with the default dataset.
|
24 |
+
Original BF16 .bin files were converted to FP16 safetensors.
|
25 |
+
When I compared 4bpw quants made from BF16 and FP16, there was about 0.1% quality loss for FP16.
|
26 |
+
I picked FP16 version because resulted files had fast loading times when version made from BF16 loaded about 100s slower.
|
27 |
+
Quantization metadata was removed from config.json to fix loading the model with some old Text-Generation-WebUI versions.
|
28 |
+
|
29 |
+
## How to run
|
30 |
+
This quantization method uses GPU and requires Exllamav2 loader which can be found in following applications:
|
31 |
+
|
32 |
+
[Text Generation Webui](https://github.com/oobabooga/text-generation-webui)
|
33 |
+
[KoboldAI](https://github.com/henk717/KoboldAI)
|
34 |
+
[ExUI](https://github.com/turboderp/exui)
|
35 |
+
[lollms-webui](https://github.com/ParisNeo/lollms-webui)
|
36 |
+
|
37 |
+
# Original model card
|
38 |
# TherapyBeagle 11B v2
|
39 |
|
40 |
_Buddy is here for {{user}}._
|