Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,39 @@ license: cc-by-nc-4.0
|
|
10 |
base_model:
|
11 |
- upstage/SOLAR-10.7B-v1.0
|
12 |
---
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
# **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
|
15 |
|
16 |
# **With 128k Context!**
|
|
|
10 |
base_model:
|
11 |
- upstage/SOLAR-10.7B-v1.0
|
12 |
---
|
13 |
+
## SOLAR-10.7B-Instruct-v1.0-128k-exl2
|
14 |
+
Model: [SOLAR-10.7B-Instruct-v1.0-128k](https://huggingface.co/CallComply/SOLAR-10.7B-Instruct-v1.0-128k)
|
15 |
+
Made by: [CallComply](https://huggingface.co/CallComply)
|
16 |
+
|
17 |
+
Based on original model: [SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)
|
18 |
+
Created by: [upstage](https://huggingface.co/upstage)
|
19 |
+
|
20 |
+
## List of quants:
|
21 |
+
[4bpw h8 (main)](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/main)
|
22 |
+
[4.65bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/4.65bpw-h8)
|
23 |
+
[5bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/5bpw-h8)
|
24 |
+
[5.5bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/5.5bpw-h8)
|
25 |
+
[6bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/6bpw-h8)
|
26 |
+
[8bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/8bpw-h8)
|
27 |
+
|
28 |
+
Quantized with Exllamav2 0.0.11 with default dataset.
|
29 |
+
## My notes about this model:
|
30 |
+
I tried to load 4bpw version of the moddel in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
|
31 |
+
With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb and it was able to retrieve details from 16000 token prompt.
|
32 |
+
With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
|
33 |
+
It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
|
34 |
+
|
35 |
+
## How to run
|
36 |
+
|
37 |
+
This quantization method uses GPU and requires Exllamav2 loader which can be found in following applications:
|
38 |
+
|
39 |
+
[Text Generation Webui](https://github.com/oobabooga/text-generation-webui)
|
40 |
+
|
41 |
+
[KoboldAI](https://github.com/henk717/KoboldAI)
|
42 |
+
|
43 |
+
[ExUI](https://github.com/turboderp/exui)
|
44 |
+
|
45 |
+
## Original model card:
|
46 |
# **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
|
47 |
|
48 |
# **With 128k Context!**
|