ISTA-DASLab
/

Meta-Llama-3-8B-AQLM-PV-1Bit-1x16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

justheuristic commited on May 31

Commit

b414226

•

1 Parent(s): 1ba714c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ __The 1x16g16 models require aqlm inference library v1.1.6 or newer:__
 `pip install aqlm[gpu,cpu]>=1.1.6`
-Note that a large portion of this model are the 16-bit embeddings/logits matrices. You can significantly reduce the model footprint by quantizing these matrices, e.g. using `bitsandbytes` LLM.int8 or NF4 formats.
 | Model      | AQLM scheme | WikiText 2 PPL | Model size, Gb | Hub link                                                                 |

 `pip install aqlm[gpu,cpu]>=1.1.6`
+Note that a large portion of this model are the 16-bit embeddings/logits matrices. You can significantly reduce the model footprint by quantizing these matrices, e.g. using `bitsandbytes` LLM.int8 or NF4 formats. This does not require additional training.
 | Model      | AQLM scheme | WikiText 2 PPL | Model size, Gb | Hub link                                                                 |