TheBloke
/

bloomz-176B-GPTQ

Text Generation

text-generation-inference

Model card Files Files and versions Community

TheBloke commited on Jul 6, 2023

Commit

d1603ef

•

1 Parent(s): 2d647fa

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -695,17 +695,17 @@ I did this using the simple *nix command `split`.
 To join the files on any *nix system, run:
 ```
-cat gptq_model-4bit--1g.split* > gptq_model-4bit--1g.safetensors
 ```
 To join the files on Windows, open a Command Prompt and run:
 ```
-COPY /B gptq_model-4bit--1g.splitaa + gptq_model-4bit--1g.splitab + gptq_model-4bit--1g.splitac gptq_model-4bit--1g.safetensors
 ```
 The SHA256SUM of the joined file will be:
-Once you have the joined file, you can safely delete `gptq_model-4bit--1g.split*`.
 ## Repositories available
@@ -714,11 +714,11 @@ Once you have the joined file, you can safely delete `gptq_model-4bit--1g.split*
 ## Two files provided - separate branches
-- Main branch:
   - Group Size = None
   - Desc Act (act-order) = True
   - This version will use the least possible VRAM, and should have higher inference performance in CUDA mode
-- Branch `group_size_128g`:
   - Group Size = 128g
   - Desc Act (act-oder) = True
   - This version will use more VRAM, which shouldn't be a problem as it shouldn't exceed 2 x 80GB or 3 x 48GB cards.

 To join the files on any *nix system, run:
 ```
+cat gptq_model-4bit--1g.JOINBEFOREUSE.split-*.safetensors > gptq_model-4bit--1g.safetensors
 ```
 To join the files on Windows, open a Command Prompt and run:
 ```
+COPY /B gptq_model-4bit--1g.JOINBEFOREUSE.split-a.safetensors + gptq_model-4bit--1g.JOINBEFOREUSE.split-b.safetensors + gptq_model-4bit--1g.JOINBEFOREUSE.split-c.safetensors gptq_model-4bit--1g.safetensors
 ```
 The SHA256SUM of the joined file will be:
+Once you have the joined file, you can safely delete `gptq_model-4bit--1g.JOINBEFOREUSE.split-*.safetensors`.
 ## Repositories available
 ## Two files provided - separate branches
+- Main branch: `gptq_model-4bit--1g.safetensors`
   - Group Size = None
   - Desc Act (act-order) = True
   - This version will use the least possible VRAM, and should have higher inference performance in CUDA mode
+- Branch `group_size_128g`: `gptq_model-4bit-128g.safetensors`
   - Group Size = 128g
   - Desc Act (act-oder) = True
   - This version will use more VRAM, which shouldn't be a problem as it shouldn't exceed 2 x 80GB or 3 x 48GB cards.