Tijmen2 commited on
Commit
91e8a08
1 Parent(s): 24c8352

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -83,6 +83,29 @@ The following hyperparameters were used during QA tuning:
83
  - num_epochs: 2.0
84
  - weight_decay: 0.0
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ## Example output
87
 
88
  **User:**
 
83
  - num_epochs: 2.0
84
  - weight_decay: 0.0
85
 
86
+ ## Versions
87
+
88
+ This repository contains:
89
+ - pytorch_model.bin: standard version (bfloat16)
90
+ - model.safetensors: same as pytorch_mode.bin but in safetensors format
91
+ - gptq_model-8bit-128g.safetensors: 8-bit quantized version for inference speedup and low-VRAM GPUs
92
+ - gptq_model-4bit-128g.safetensors: 4-bit quantized version for even faster inference, lower VRAM requirements, lower quality
93
+
94
+ When using one of the quantized versions, make sure to pass the quantization configuration:
95
+ ```json
96
+ {
97
+ "bits": <4 or 8 depending on the version>,
98
+ "group_size": 128,
99
+ "damp_percent": 0.01,
100
+ "desc_act": false,
101
+ "static_groups": false,
102
+ "sym": true,
103
+ "true_sequential": true,
104
+ "model_name_or_path": null,
105
+ "model_file_base_name": null
106
+ }
107
+ ```
108
+
109
  ## Example output
110
 
111
  **User:**