Text Generation
Transformers
bloom
Eval Results
text-generation-inference
TheBloke commited on
Commit
ed78723
1 Parent(s): 9ad39a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -841,7 +841,7 @@ print(pipe(prompt_template)[0]['generated_text'])
841
 
842
  This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
843
 
844
- It was created with group_size none (-1) to reduce VRAM usage, and with --act-order (desc_act) to increase inference speed.
845
 
846
  * `gptq_model-4bit-128g.safetensors`
847
  * Works with AutoGPTQ in CUDA or Triton modes.
@@ -856,7 +856,7 @@ It was created with group_size none (-1) to reduce VRAM usage, and with --act-or
856
 
857
  This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
858
 
859
- It was created with both group_size 128g and --act-order (desc_act) for increased inference quality.
860
 
861
  **Note** Using group_size + desc_act together can significantly lower performance in AutoGPTQ CUDA. You might want to try AutoGPTQ Triton mode instead (Linux only.)
862
 
 
841
 
842
  This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
843
 
844
+ It was created with group_size none (-1) to reduce VRAM usage, and with --act-order (desc_act) to improve accuracy of responses.
845
 
846
  * `gptq_model-4bit-128g.safetensors`
847
  * Works with AutoGPTQ in CUDA or Triton modes.
 
856
 
857
  This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
858
 
859
+ It was created with both group_size 128g and --act-order (desc_act) for even higher inference accuracy, at the cost of increased VRAM usage. Because we already need 2 x 80GB or 3 x 48GB GPUs, I don't expect the increased VRAM usage to change the GPU requirements.
860
 
861
  **Note** Using group_size + desc_act together can significantly lower performance in AutoGPTQ CUDA. You might want to try AutoGPTQ Triton mode instead (Linux only.)
862