cgus commited on
Commit
72f504b
1 Parent(s): 41e7362

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -10,7 +10,39 @@ license: cc-by-nc-4.0
10
  base_model:
11
  - upstage/SOLAR-10.7B-v1.0
12
  ---
13
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
15
 
16
  # **With 128k Context!**
 
10
  base_model:
11
  - upstage/SOLAR-10.7B-v1.0
12
  ---
13
+ ## SOLAR-10.7B-Instruct-v1.0-128k-exl2
14
+ Model: [SOLAR-10.7B-Instruct-v1.0-128k](https://huggingface.co/CallComply/SOLAR-10.7B-Instruct-v1.0-128k)
15
+ Made by: [CallComply](https://huggingface.co/CallComply)
16
+
17
+ Based on original model: [SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)
18
+ Created by: [upstage](https://huggingface.co/upstage)
19
+
20
+ ## List of quants:
21
+ [4bpw h8 (main)](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/main)
22
+ [4.65bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/4.65bpw-h8)
23
+ [5bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/5bpw-h8)
24
+ [5.5bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/5.5bpw-h8)
25
+ [6bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/6bpw-h8)
26
+ [8bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/8bpw-h8)
27
+
28
+ Quantized with Exllamav2 0.0.11 with default dataset.
29
+ ## My notes about this model:
30
+ I tried to load 4bpw version of the moddel in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
31
+ With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb and it was able to retrieve details from 16000 token prompt.
32
+ With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
33
+ It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
34
+
35
+ ## How to run
36
+
37
+ This quantization method uses GPU and requires Exllamav2 loader which can be found in following applications:
38
+
39
+ [Text Generation Webui](https://github.com/oobabooga/text-generation-webui)
40
+
41
+ [KoboldAI](https://github.com/henk717/KoboldAI)
42
+
43
+ [ExUI](https://github.com/turboderp/exui)
44
+
45
+ ## Original model card:
46
  # **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
47
 
48
  # **With 128k Context!**