elinas commited on
Commit
4b5949d
1 Parent(s): db2e311

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -10
README.md CHANGED
@@ -8,20 +8,14 @@ tags:
8
 
9
  # vicuna-13b-4bit
10
  Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
11
- This does **not** support `llama.cpp` or any other cpp implemetations, only `cuda` or `triton`. These implementations require a different format to use.
12
 
13
  Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/
14
 
15
- GPTQ - https://github.com/qwopqwop200/GPTQ-for-LLaMa
 
16
 
17
- # Important - Update 2023-04-03
18
- Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
19
-
20
- If you're not familiar with the Git process
21
- 1. `git checkout a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773`
22
- 2. `git switch -c cuda-stable`
23
-
24
- This creates and switches to a `cuda-stable` branch to continue using the quantized models.
25
 
26
  # Usage
27
  1. Run manually through GPTQ
 
8
 
9
  # vicuna-13b-4bit
10
  Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
11
+ This does **not** support `llama.cpp` or any other cpp implemetations, only `cuda` is supported. These implementations require a different format to use.
12
 
13
  Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/
14
 
15
+ # Important - Update 2023-04-05
16
+ Recent GPTQ commits have introduced breaking changes to model loading and you should this fork for a stable experience https://github.com/oobabooga/GPTQ-for-LLaMa
17
 
18
+ Curently only cuda is supported.
 
 
 
 
 
 
 
19
 
20
  # Usage
21
  1. Run manually through GPTQ