azhiboedova
/

Meta-Llama-3.1-8B-Instruct-AQLM-2Bit-1x16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

azhiboedova commited on Aug 2

Commit

6500162

•

1 Parent(s): 2be511f

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -16,11 +16,11 @@ tags:
 **Model Comparison: Quantized vs Basic Model**
-| Model Type                  | Meta-Llama-3.1-8B-Instruct | Meta-Llama-3.1-8B-Instruct-AQLM-2Bit-1x16 - quantized|
-|-----------------------------|----------------------------|------------------------------------------------------|
-| Parameters                  |     8.03B                  |          2.04B                                       |
-| Peak Memory Usage           |    20.15 GB                |          4.22 GB                                     |
-| MMLU Accuracy               |    60.9%                   |          45.5%                                       |
 **Model Architecture**
 The Llama 3.1 8B model is a state-of-the-art language model designed for a wide range of conversational and text generation tasks. By applying the Adaptive Quantization Learning Mechanism (AQLM) developed by Yandex Research, the model's size has been significantly reduced without sacrificing its powerful capabilities. This approach dynamically adjusts the precision of model parameters during training, optimizing for both performance and efficiency.

 **Model Comparison: Quantized vs Basic Model**
+| Model Type                  | Meta-Llama-3.1-8B-Instruct | Meta-Llama-3.1-2B-Instruct-AQLM-2Bit-1x16|
+|-----------------------------|----------------------------|------------------------------------------|
+| Parameters                  |     8.03B                  |          2.04B                           |
+| Peak Memory Usage           |    20.15 GB                |          4.22 GB                         |
+| MMLU Accuracy               |    60.9%                   |          45.5%                           |
 **Model Architecture**
 The Llama 3.1 8B model is a state-of-the-art language model designed for a wide range of conversational and text generation tasks. By applying the Adaptive Quantization Learning Mechanism (AQLM) developed by Yandex Research, the model's size has been significantly reduced without sacrificing its powerful capabilities. This approach dynamically adjusts the precision of model parameters during training, optimizing for both performance and efficiency.