amir-ma71
/

Dorna-Llama3-8B-Instruct-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

amir-ma71 commited on Jun 11, 2024

Commit

05af437

·

verified ·

1 Parent(s): 5f80f56

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pipeline_tag: question-answering
 First of all, we would like to express our gratitude to [PartAI](https://huggingface.co/PartAI) for their efforts in expanding large language models in the Persian language by releasing the ["Darna"](https://huggingface.co/PartAI/Dorna-Llama3-8B-Instruct) model.
-The quantized version of the "Darna" language model requires only 6GB of GPU memory for loading, while the original model requires 40GB of GPU memory.
 This model based on AWQ quantize method that decrease the volume of model in minimum decrease of Accuracy by changing type of weights

 First of all, we would like to express our gratitude to [PartAI](https://huggingface.co/PartAI) for their efforts in expanding large language models in the Persian language by releasing the ["Darna"](https://huggingface.co/PartAI/Dorna-Llama3-8B-Instruct) model.
+The quantized version of the "Darna" language model requires only ~6GB of GPU memory for loading, while the original model requires ~40GB of GPU memory.
 This model based on AWQ quantize method that decrease the volume of model in minimum decrease of Accuracy by changing type of weights