azhiboedova
/

Meta-Llama-3.1-8B-Instruct-AQLM-2Bit-1x16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

azhiboedova commited on Aug 2, 2024

Commit

ebac931

·

verified ·

1 Parent(s): e166324

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -14,6 +14,13 @@ tags:
 - [Anastasiia Zhiboedova](https://www.linkedin.com/in/azhiboedova/)
 - [Mike Arbuzov](https://www.linkedin.com/in/mike-arbuzov/)
 **Model Architecture**
 The Llama 3.1 8B model is a state-of-the-art language model designed for a wide range of conversational and text generation tasks. By applying the Adaptive Quantization Learning Mechanism (AQLM) developed by Yandex Research, the model's size has been significantly reduced without sacrificing its powerful capabilities. This approach dynamically adjusts the precision of model parameters during training, optimizing for both performance and efficiency.

 - [Anastasiia Zhiboedova](https://www.linkedin.com/in/azhiboedova/)
 - [Mike Arbuzov](https://www.linkedin.com/in/mike-arbuzov/)
+**Model Comparison: Quantized vs Basic Model**
+| Model Type                  | Meta-Llama-3.1-8B-Instruct | Meta-Llama-3.1-8B-Instruct-AQLM-2Bit-1x16 - quantized|
+|-----------------------------|----------------------------|------------------------------------------------------|
+| Parameters                  |     8.03B                  |          2.04B                                       |
+| Peak Memory Usage           |    20.15 GB                |          4.22 GB                                     |
 **Model Architecture**
 The Llama 3.1 8B model is a state-of-the-art language model designed for a wide range of conversational and text generation tasks. By applying the Adaptive Quantization Learning Mechanism (AQLM) developed by Yandex Research, the model's size has been significantly reduced without sacrificing its powerful capabilities. This approach dynamically adjusts the precision of model parameters during training, optimizing for both performance and efficiency.