C4AI Command R7B Arabic - Quantized Versions in GGUF Format

This repository contains quantized versions of the C4AI Command R7B Arabic model, provided in GGUF format. These quantized versions are designed to reduce model size and improve inference speed while maintaining reasonable performance.

Available Quantized Versions

The following GGUF quantized versions are available:

  • Q2_K
  • Q3_K_M
  • Q4_0
  • Q4_K_M
  • Q5_K_S
  • Q5_K_M
  • Q6_K
  • Q8_0
  • F16 Quantization

Original Repository

The original model was developed by Cohere and Cohere For AI. You can find it here:

https://huggingface.co/CohereForAI/c4ai-command-r7b-arabic-02-2025

License

These quantized versions follow the same licensing terms as the original model: CC-BY-NC-4.0, with an additional requirement to comply with C4AI’s Acceptable Use Policy. By using these models, you agree to abide by these terms.

Available Models

The GGUF files available in this repository are listed below:

Quantization File Name
Q2_K c4ai-command-r7b-arabic-02-2025-Q2_K.gguf
Q3_K_M c4ai-command-r7b-arabic-02-2025-Q3_K_M.gguf
Q4_0 c4ai-command-r7b-arabic-02-2025-Q4_0.gguf
Q4_K_M c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf
Q5_K_S c4ai-command-r7b-arabic-02-2025-Q5_K_S.gguf
Q5_K_M c4ai-command-r7b-arabic-02-2025-Q5_K_M.gguf
Q6_K c4ai-command-r7b-arabic-02-2025-Q6_K.gguf
Q8_0 c4ai-command-r7b-arabic-02-2025-Q8_0.gguf
F16 c4ai-command-r7b-arabic-02-2025-F16.gguf

Installation

To use these GGUF models, you can use:

1. llama-cpp-python (Python Library)

Install with:

pip install llama-cpp-python

2. llama.cpp (C++ Library)

If you prefer a non-Python workflow, you can use the llama.cpp C++ implementation.

3. LM Studio (GUI Interface)

LM Studio provides an easy-to-use graphical interface for running GGUF models locally. You can download it from:

https://lmstudio.ai

4. GPT4All (Cross-Platform GUI & CLI)

GPT4All supports running GGUF models across various operating systems. You can install it from:

https://gpt4all.io

5. Ollama (Local Model Runner)

Ollama is a lightweight tool for running LLMs locally. Download it from:

https://ollama.com

Downloading the Models

You can download the GGUF files from this repository using the huggingface_hub library:

from huggingface_hub import hf_hub_download

hf_hub_download(
    repo_id="eltay89/c4ai-command-r7b-arabic-02-2025-gguf",
    filename="c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf",
    local_dir="."
)

Alternatively, download the files directly from the repository’s page on Hugging Face.

Usage

Using llama-cpp-python in Python

from llama_cpp import Llama

# Load the model (replace with the path to your downloaded GGUF file)
llm = Llama(model_path="path/to/c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf")

# Generate text
output = llm("مرحبا، كيف حالك؟", max_tokens=100, temperature=0.3)
print(output['choices'][0]['text'])

Replace "path/to/c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf" with the actual path to your downloaded GGUF file.

The prompt "مرحبا، كيف حالك؟" translates to "Hello, how are you?" in Arabic.

Chat Templates

LM Studio Chat Template

To get clean, conversational outputs in LM Studio, use this chat template:

Before System: <|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>  
After System: <|END_OF_TURN_TOKEN|>  
Before User: <|START_OF_TURN_TOKEN|><|USER_TOKEN|>  
After User: <|END_OF_TURN_TOKEN|>  
Before Assistant: <|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>  
After Assistant: <|END_OF_TURN_TOKEN|>  
Additional Stop Strings: <|END_RESPONSE|>, <|END_OF_TURN_TOKEN|>, <|START_THINKING|>, <|END_THINKING|>, <|START_ACTION|>, <|END_ACTION|>, <|START_TOOL_RESULT|>, <|END_TOOL_RESULT|>

Ollama Chat Template

For Ollama, create a Modelfile with this content:

FROM ./c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf

TEMPLATE """
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|> {{ system }} <|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|USER_TOKEN|> {{ prompt }} <|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|> {{ response }} <|END_OF_TURN_TOKEN|>
"""

PARAMETER stop "<|END_RESPONSE|>"
PARAMETER stop "<|END_OF_TURN_TOKEN|>"
PARAMETER stop "<|START_THINKING|>"
PARAMETER stop "<|END_THINKING|>"
PARAMETER stop "<|START_ACTION|>"
PARAMETER stop "<|END_ACTION|>"
PARAMETER stop "<|START_TOOL_RESULT|>"
PARAMETER stop "<|END_TOOL_RESULT|>"

Run:

ollama create c4ai-command-r7b-arabic -f Modelfile
ollama run c4ai-command-r7b-arabic

Contact

For questions or issues, please refer to the original repository or contact Cohere For AI at info@for.ai.

Downloads last month
854
GGUF
Model size
8.03B params
Architecture
cohere2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.