liuhaotian commited on
Commit
3d4b5d4
1 Parent(s): d1e2541

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +4 -3
app.py CHANGED
@@ -343,11 +343,12 @@ title_markdown = """
343
  ONLY WORKS WITH GPU!
344
 
345
  You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
 
346
 
347
  Recommended configurations:
348
- | Hardware | T4-Small (16G) | A10G-Medium (24G) | A100-Large (40G) |
349
- |-------------------|-----------------|-------------------|------------------|
350
- | **Bits** | 4 (default) | 8 | 16 |
351
 
352
  """
353
 
 
343
  ONLY WORKS WITH GPU!
344
 
345
  You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
346
+ *Note: 8-bit seems to be slower than both 4-bit/16-bit. Although it has enough VRAM to support 8-bit, until we figure out the inference speed issue, we recommend 4-bit for A10G for the best efficiency.*
347
 
348
  Recommended configurations:
349
+ | Hardware | T4-Small (16G) | A10G-Small (24G) | A100-Large (40G) |
350
+ |-------------------|-----------------|------------------|------------------|
351
+ | **Bits** | 4 (default) | 4 | 16 |
352
 
353
  """
354