azhiboedova
commited on
Commit
•
2be511f
1
Parent(s):
8b41e5d
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,7 @@ tags:
|
|
20 |
|-----------------------------|----------------------------|------------------------------------------------------|
|
21 |
| Parameters | 8.03B | 2.04B |
|
22 |
| Peak Memory Usage | 20.15 GB | 4.22 GB |
|
|
|
23 |
|
24 |
**Model Architecture**
|
25 |
The Llama 3.1 8B model is a state-of-the-art language model designed for a wide range of conversational and text generation tasks. By applying the Adaptive Quantization Learning Mechanism (AQLM) developed by Yandex Research, the model's size has been significantly reduced without sacrificing its powerful capabilities. This approach dynamically adjusts the precision of model parameters during training, optimizing for both performance and efficiency.
|
@@ -34,15 +35,15 @@ Incorporating the innovative AQLM (Adaptive Quantization Learning Mechanism), th
|
|
34 |
The model was compressed using Vast AI with 8x A100 GPUs, taking approximately 5-6 hours to complete the process.
|
35 |
|
36 |
**Evaluations**
|
37 |
-
The quantized Llama 3.1 8B model was rigorously evaluated using the Massive Multitask Language Understanding
|
38 |
|
39 |
-
For evaluation, the library
|
40 |
```python
|
41 |
-
|
42 |
```
|
43 |
[Colab Notebook with Model Evaluation](https://colab.research.google.com/drive/16hXI7pd9KSTeUMNfGCB0wGMAzcBqzdZM?usp=sharing)
|
44 |
|
45 |
-
|
46 |
|
47 |
**How to use**
|
48 |
To import this model with Python and run it, you can use the following code:
|
|
|
20 |
|-----------------------------|----------------------------|------------------------------------------------------|
|
21 |
| Parameters | 8.03B | 2.04B |
|
22 |
| Peak Memory Usage | 20.15 GB | 4.22 GB |
|
23 |
+
| MMLU Accuracy | 60.9% | 45.5% |
|
24 |
|
25 |
**Model Architecture**
|
26 |
The Llama 3.1 8B model is a state-of-the-art language model designed for a wide range of conversational and text generation tasks. By applying the Adaptive Quantization Learning Mechanism (AQLM) developed by Yandex Research, the model's size has been significantly reduced without sacrificing its powerful capabilities. This approach dynamically adjusts the precision of model parameters during training, optimizing for both performance and efficiency.
|
|
|
35 |
The model was compressed using Vast AI with 8x A100 GPUs, taking approximately 5-6 hours to complete the process.
|
36 |
|
37 |
**Evaluations**
|
38 |
+
The quantized Llama 3.1 8B model was rigorously evaluated using the [The MMLU (Massive Multitask Language Understanding)](https://huggingface.co/datasets/tasksource/mmlu) dataset, available on Hugging Face, is designed to evaluate the performance of language models across a wide range of subjects. It includes multiple-choice questions covering diverse topics such as math, history, law, and ethics. Each question is accompanied by four possible answers, making it an ideal benchmark for assessing the accuracy and generalization capabilities of language models.
|
39 |
|
40 |
+
For the evaluation of our model, we utilized the transformers library from Hugging Face. Below is an example of how to set up and run the evaluation:
|
41 |
```python
|
42 |
+
dataset = load_dataset("tasksource/mmlu", subset, split="validation")
|
43 |
```
|
44 |
[Colab Notebook with Model Evaluation](https://colab.research.google.com/drive/16hXI7pd9KSTeUMNfGCB0wGMAzcBqzdZM?usp=sharing)
|
45 |
|
46 |
+
<img src="https://huggingface.co/azhiboedova/Meta-Llama-3.1-8B-Instruct-AQLM-2Bit-1x16/resolve/main/Model%20Performance%20by%20Subject.png" alt="Model Perfomance" width="500">
|
47 |
|
48 |
**How to use**
|
49 |
To import this model with Python and run it, you can use the following code:
|