cgus
/

SOLAR-10.7B-Instruct-v1.0-128k-exl2

Text Generation

text-generation-inference

Model card Files Files and versions Community

cgus commited on Jan 23

Commit

72f504b

•

1 Parent(s): 41e7362

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -10,7 +10,39 @@ license: cc-by-nc-4.0
 base_model:
   - upstage/SOLAR-10.7B-v1.0
 ---
 # **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
 # **With 128k Context!**

 base_model:
   - upstage/SOLAR-10.7B-v1.0
 ---
+## SOLAR-10.7B-Instruct-v1.0-128k-exl2
+Model: [SOLAR-10.7B-Instruct-v1.0-128k](https://huggingface.co/CallComply/SOLAR-10.7B-Instruct-v1.0-128k)
+Made by: [CallComply](https://huggingface.co/CallComply)
+Based on original model: [SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)
+Created by: [upstage](https://huggingface.co/upstage)
+## List of quants:
+[4bpw h8 (main)](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/main)
+[4.65bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/4.65bpw-h8)
+[5bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/5bpw-h8)
+[5.5bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/5.5bpw-h8)
+[6bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/6bpw-h8)
+[8bpw h8](https://huggingface.co/cgus/SOLAR-10.7B-Instruct-v1.0-128k-exl2/tree/8bpw-h8)
+Quantized with Exllamav2 0.0.11 with default dataset.
+## My notes about this model:
+I tried to load 4bpw version of the moddel in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
+With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb and it was able to retrieve details from 16000 token prompt.
+With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
+It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
+## How to run
+This quantization method uses GPU and requires Exllamav2 loader which can be found in following applications:
+[Text Generation Webui](https://github.com/oobabooga/text-generation-webui)
+[KoboldAI](https://github.com/henk717/KoboldAI)
+[ExUI](https://github.com/turboderp/exui)
+## Original model card:
 # **Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!**
 # **With 128k Context!**