๐Ÿฆ™ ALLaM-7B-Instruct-GGUF

This repository provides quantized GGUF versions of ALLaM-7B-Instruct, optimized for efficient inference using llama.cpp.

โš ๏ธ Acknowledgment

The original model was developed by ALLaM-AI and is available here:
๐Ÿ”— ALLaM-7B-Instruct-Preview

This repository only provides quantized versions for improved performance on different hardware.


โœจ Overview

ALLaM-7B-Instruct is an Arabic-centric instruction-tuned model based on Metaโ€™s LLaMA architecture, designed for natural language understanding and generation in Arabic.

๐Ÿš€ Whatโ€™s New?

โœ… GGUF Format โ€“ Optimized for llama.cpp
โœ… Multiple Quantization Levels โ€“ Balance between precision and efficiency
โœ… Run on CPUs & Low-Resource Devices โ€“ No need for high-end GPUs!


๐Ÿ“‚ Available Model Quantizations

Model Variant Precision Size Best For
ALLaM-7B-Instruct-f16.gguf FP16 Large High-precision tasks
ALLaM-7B-Instruct-Q8_0.gguf 8-bit Medium Balanced quality & speed
ALLaM-7B-Instruct-Q6_K.gguf 6-bit Small Good trade-off
ALLaM-7B-Instruct-Q5_0.gguf 5-bit Small Alternative quantization
ALLaM-7B-Instruct-Q5_K_M.gguf 5-bit Smaller Fast inference
ALLaM-7B-Instruct-Q4_0.gguf 4-bit Very Small Legacy format
ALLaM-7B-Instruct-Q4_K_M.gguf 4-bit Very Small Low-memory devices
ALLaM-7B-Instruct-Q2_K.gguf 2-bit Smallest Extreme efficiency

๐Ÿ“– Installation & Setup

1๏ธโƒฃ Install llama.cpp

Clone and build llama.cpp:

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make

2๏ธโƒฃ Download the Model

Choose and download a .gguf file from this repository.

3๏ธโƒฃ Run Inference

Use llama.cpp to generate responses:

./main -m ALLaM-7B-Instruct-Q4_0.gguf -p "ูƒูŠู ุฃุฌู‡ุฒ ูƒูˆุจ ุดุงู‡ูŠุŸ"

Expected Output:

ู„ุชุญุถูŠุฑ ูƒูˆุจ ุดุงูŠุŒ ุงุบู„ูŠ ุงู„ู…ุงุกุŒ ุถุน ุงู„ุดุงูŠ ููŠ ุงู„ูƒูˆุจุŒ ูˆุงุณูƒุจ ุงู„ู…ุงุก ุงู„ุณุงุฎู† ููˆู‚ู‡. ุงุชุฑูƒู‡ ู„ุฏู‚ุงุฆู‚ ุซู… ุงุณุชู…ุชุน ุจู…ุฐุงู‚ู‡!

๐Ÿ“Š Benchmarks & Performance

Quantization Format Model Size CPU (Tokens/sec) GPU (Tokens/sec)
FP16 Large ~2 ~15
Q8_0 Medium ~4 ~30
Q6_K Smaller ~6 ~40
Q5_0 Small ~7 ~42
Q5_K_M Smaller ~8 ~45
Q4_0 Very Small ~9 ~48
Q4_K_M Very Small ~10 ~50
Q2_K Smallest ~12 ~55

Performance may vary based on hardware and configuration.


๐Ÿ“œ License

This model follows the ALLaM-AI license. Refer to their Hugging Face repository for details.

โค๏ธ Acknowledgments

  • ALLaM-AI for developing the original ALLaM-7B-Instruct model.
  • llama.cpp by ggerganov for optimized inference.

โญ Contributions & Feedback

If you find this quantized model useful, feel free to contribute, provide feedback, or share your results!


Downloads last month
296
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for eltay89/ALLaM-7B-Instruct-GGUF

Quantized
(8)
this model