ModelCloud
/

TinyLlama-1.1B-Chat-v1.0-autoround-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

lrl-modelcloud commited on Jul 12

Commit

0cab85d

•

1 Parent(s): 66f1bfd

Create README.md

Files changed (1) hide show

README.md +36 -0

README.md ADDED Viewed

	@@ -0,0 +1,36 @@

+This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
+- **Bits**: 4
+- **Group Size**: 128
+- **Desc Act**: true
+- **Static Groups**: false
+- **Sym**: true
+- **LM Head**: false
+- **Damp Percent**: 0.01
+- **True Sequential**: true
+- **Model Name or Path**:
+- **Model File Base Name**: model
+- **Quant Method**: auto_round
+- **Checkpoint Format**: gptq
+- **Metadata**
+  - **Quantizer**: gptqmodel:0.9.8-dev0
+  - **Enable Full Range**: false
+  - **Batch Size**: 1
+  - **AMP**: true
+  - **LR Scheduler**: null
+  - **Enable Quanted Input**: true
+  - **Enable Minmax Tuning**: true
+  - **Learning Rate (LR)**: null
+  - **Minmax LR**: null
+  - **Low GPU Memory Usage**: true
+  - **Iterations (Iters)**: 200
+  - **Sequence Length (Seqlen)**: 2048
+  - **Number of Samples (Nsamples)**: 512
+  - **Sampler**: rand
+  - **Seed**: 42
+  - **Number of Blocks (Nblocks)**: 1
+  - **Gradient Accumulate Steps**: 1
+  - **Not Use Best MSE**: false
+  - **Dynamic Max Gap**: -1
+  - **Data Type**: int
+  - **Scale Data Type (Scale Dtype)**: fp16