TheBloke commited on
Commit
90b4713
1 Parent(s): fdc3e58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -48,6 +48,10 @@ This repo contains GPTQ model files for [Alpin's Goliath 120B](https://huggingfa
48
 
49
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
50
 
 
 
 
 
51
  These files were quantised using hardware kindly provided by [Massed Compute](https://massedcompute.com/).
52
 
53
  <!-- description end -->
@@ -68,13 +72,10 @@ You are a helpful AI assistant.
68
 
69
  USER: {prompt}
70
  ASSISTANT:
71
-
72
  ```
73
 
74
  <!-- prompt-template end -->
75
 
76
-
77
-
78
  <!-- README_GPTQ.md-compatible clients start -->
79
  ## Known compatible clients / servers
80
 
@@ -112,8 +113,8 @@ Most GPTQ files are made with AutoGPTQ. Mistral models are currently made with T
112
 
113
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
114
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
115
- | [main](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 9.99 GB | Yes | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
116
- | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 9.95 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
117
  | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 45.11 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
118
  | [gptq-3bit-128g-actorder_True](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/gptq-3bit-128g-actorder_True) | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 47.25 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False. |
119
 
 
48
 
49
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
50
 
51
+ **NOTE**: The 4-bit models have been sharded, as otherwise they cannot be uploaded on HF due to the 50GB file limit. This means they will not work with AutoGPTQ at the time of writing.
52
+
53
+ They will work fine with ExLlama, TGI, and via Transformers.
54
+
55
  These files were quantised using hardware kindly provided by [Massed Compute](https://massedcompute.com/).
56
 
57
  <!-- description end -->
 
72
 
73
  USER: {prompt}
74
  ASSISTANT:
 
75
  ```
76
 
77
  <!-- prompt-template end -->
78
 
 
 
79
  <!-- README_GPTQ.md-compatible clients start -->
80
  ## Known compatible clients / servers
81
 
 
113
 
114
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
115
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
116
+ | [main](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 58.36 GB | Yes | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
117
+ | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 60.56 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
118
  | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 45.11 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
119
  | [gptq-3bit-128g-actorder_True](https://huggingface.co/TheBloke/goliath-120b-GPTQ/tree/gptq-3bit-128g-actorder_True) | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1) | 4096 | 47.25 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False. |
120