C4AI Command R7B Arabic - Quantized Versions in GGUF Format

This repository contains quantized versions of the C4AI Command R7B Arabic model, provided in GGUF format. These quantized versions are designed to reduce model size and improve inference speed while maintaining reasonable performance.

Available Quantized Versions

The following GGUF quantized versions are available:

Q2_K
Q3_K_M
Q4_0
Q4_K_M
Q5_K_S
Q5_K_M
Q6_K
Q8_0
F16 Quantization

Original Repository

The original model was developed by Cohere and Cohere For AI. You can find it here:

https://huggingface.co/CohereForAI/c4ai-command-r7b-arabic-02-2025

License

These quantized versions follow the same licensing terms as the original model: CC-BY-NC-4.0, with an additional requirement to comply with C4AI’s Acceptable Use Policy. By using these models, you agree to abide by these terms.

Available Models

The GGUF files available in this repository are listed below:

Quantization	File Name
Q2_K	`c4ai-command-r7b-arabic-02-2025-Q2_K.gguf`
Q3_K_M	`c4ai-command-r7b-arabic-02-2025-Q3_K_M.gguf`
Q4_0	`c4ai-command-r7b-arabic-02-2025-Q4_0.gguf`
Q4_K_M	`c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf`
Q5_K_S	`c4ai-command-r7b-arabic-02-2025-Q5_K_S.gguf`
Q5_K_M	`c4ai-command-r7b-arabic-02-2025-Q5_K_M.gguf`
Q6_K	`c4ai-command-r7b-arabic-02-2025-Q6_K.gguf`
Q8_0	`c4ai-command-r7b-arabic-02-2025-Q8_0.gguf`
F16	`c4ai-command-r7b-arabic-02-2025-F16.gguf`

Installation

To use these GGUF models, you can use:

1. `llama-cpp-python` (Python Library)

Install with:

pip install llama-cpp-python

2. `llama.cpp` (C++ Library)

If you prefer a non-Python workflow, you can use the llama.cpp C++ implementation.

3. LM Studio (GUI Interface)

LM Studio provides an easy-to-use graphical interface for running GGUF models locally. You can download it from:

https://lmstudio.ai

4. GPT4All (Cross-Platform GUI & CLI)

GPT4All supports running GGUF models across various operating systems. You can install it from:

https://gpt4all.io

5. Ollama (Local Model Runner)

Ollama is a lightweight tool for running LLMs locally. Download it from:

https://ollama.com

Downloading the Models

You can download the GGUF files from this repository using the huggingface_hub library:

from huggingface_hub import hf_hub_download

hf_hub_download(
    repo_id="eltay89/c4ai-command-r7b-arabic-02-2025-gguf",
    filename="c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf",
    local_dir="."
)

Alternatively, download the files directly from the repository’s page on Hugging Face.

Usage

Using `llama-cpp-python` in Python

from llama_cpp import Llama

# Load the model (replace with the path to your downloaded GGUF file)
llm = Llama(model_path="path/to/c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf")

# Generate text
output = llm("مرحبا، كيف حالك؟", max_tokens=100, temperature=0.3)
print(output['choices'][0]['text'])

Replace "path/to/c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf" with the actual path to your downloaded GGUF file.

The prompt "مرحبا، كيف حالك؟" translates to "Hello, how are you?" in Arabic.

Chat Templates

LM Studio Chat Template

To get clean, conversational outputs in LM Studio, use this chat template:

Before System: <|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>  
After System: <|END_OF_TURN_TOKEN|>  
Before User: <|START_OF_TURN_TOKEN|><|USER_TOKEN|>  
After User: <|END_OF_TURN_TOKEN|>  
Before Assistant: <|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>  
After Assistant: <|END_OF_TURN_TOKEN|>  
Additional Stop Strings: <|END_RESPONSE|>, <|END_OF_TURN_TOKEN|>, <|START_THINKING|>, <|END_THINKING|>, <|START_ACTION|>, <|END_ACTION|>, <|START_TOOL_RESULT|>, <|END_TOOL_RESULT|>

Ollama Chat Template

For Ollama, create a Modelfile with this content:

FROM ./c4ai-command-r7b-arabic-02-2025-Q4_K_M.gguf

TEMPLATE """
<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|> {{ system }} <|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|USER_TOKEN|> {{ prompt }} <|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|> {{ response }} <|END_OF_TURN_TOKEN|>
"""

PARAMETER stop "<|END_RESPONSE|>"
PARAMETER stop "<|END_OF_TURN_TOKEN|>"
PARAMETER stop "<|START_THINKING|>"
PARAMETER stop "<|END_THINKING|>"
PARAMETER stop "<|START_ACTION|>"
PARAMETER stop "<|END_ACTION|>"
PARAMETER stop "<|START_TOOL_RESULT|>"
PARAMETER stop "<|END_TOOL_RESULT|>"

Run:

ollama create c4ai-command-r7b-arabic -f Modelfile
ollama run c4ai-command-r7b-arabic

Contact

For questions or issues, please refer to the original repository or contact Cohere For AI at info@for.ai.