zamal
/

Molmo-7B-GPTQ-4bit

4-bit precision

Model card Files Files and versions Community

zamal commited on Oct 3

Commit

9c5ff8a

•

1 Parent(s): fc48563

Update README.md

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -9,8 +9,6 @@ license: apache-2.0
 The **Molmo-7B-GPTQ-4bit** model is a transformer-based model fine-tuned for NLP tasks. It has been quantized to 4-bit precision for efficient deployment. This model has been prepared using **bitsandbytes** for 4-bit quantization rather than using **AutoGPTQ**, which does not natively support this model format as of now. The quantization leverages the `BitsAndBytesConfig` from the `transformers` library, enabling highly optimized GPU inference with reduced memory usage.
-## Model Card
 <div align="center">
   <img src="https://molmo.allenai.org/opengraph-image.png" alt="Model Architecture" width="80%" />
 </div>

 The **Molmo-7B-GPTQ-4bit** model is a transformer-based model fine-tuned for NLP tasks. It has been quantized to 4-bit precision for efficient deployment. This model has been prepared using **bitsandbytes** for 4-bit quantization rather than using **AutoGPTQ**, which does not natively support this model format as of now. The quantization leverages the `BitsAndBytesConfig` from the `transformers` library, enabling highly optimized GPU inference with reduced memory usage.
 <div align="center">
   <img src="https://molmo.allenai.org/opengraph-image.png" alt="Model Architecture" width="80%" />
 </div>