elinas commited on
Commit
8092d4a
1 Parent(s): 33f0dd9

update on breaking changes

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -9,6 +9,15 @@ https://github.com/qwopqwop200/GPTQ-for-LLaMa
9
 
10
  LoRA credit to https://huggingface.co/baseten/alpaca-30b
11
 
 
 
 
 
 
 
 
 
 
12
  # Update 2023-03-29
13
  There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better
14
  on the 128 groupsize version, but the tradeoff is not being able to run it at full context without offloading or a GPU with more VRAM.
 
9
 
10
  LoRA credit to https://huggingface.co/baseten/alpaca-30b
11
 
12
+ # Update 2023-04-03
13
+ Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
14
+
15
+ If you're not familiar with the Git process
16
+ 1. `git checkout a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773`
17
+ 2. `git switch -c cuda-stable`
18
+
19
+ This creates and switches to a `cuda-stable` branch to continue using the quantized models.
20
+
21
  # Update 2023-03-29
22
  There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better
23
  on the 128 groupsize version, but the tradeoff is not being able to run it at full context without offloading or a GPU with more VRAM.