Averroes-Q 🏛️

A 7B bilingual (Arabic/English) language model, LoRA fine-tuned on 1.3M+ curated Arabic documents from classical and modern sources.

Built by Hayula (هيولة) — Open Arabic AI.

Model Details

Property	Value
Base	Qwen 2.5 7B (4-bit → fp16)
Fine-tune	LoRA (rank=8, scale=20, 16 layers)
Training Data	1.3M+ Arabic documents (8.2 GB)
Validation Data	400K+ Arabic docs (422 MB)
Training Steps	1,000 iterations
Context Length	2,048 tokens
Final Val Loss	1.440
Hardware	Apple M2 Ultra (192 GB)
Format	Safetensors (fp16)

Training Data

The corpus is a curated collection of high-quality Arabic texts sourced from:

Classical Islamic texts
Modern Arabic encyclopedias
Scientific papers and technical documentation
Arabic literature and journalism

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("BinSaqban/Averroes-Q")
tokenizer = AutoTokenizer.from_pretrained("BinSaqban/Averroes-Q")

prompt = "ما هي اللغة العربية؟"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Limitations

This is an experimental release. The model inherits Qwen 2.5 architecture and capabilities, fine-tuned for Arabic fluency.
Training was limited to 1,000 LoRA steps on consumer hardware.
The model may exhibit biases present in the training data.

About Hayula

Hayula (هيولة = awesome/cool) is an open Arabic AI lab. We build state-of-the-art language models for Arabic and beyond, named after the scholars who shaped civilization.

Downloads last month: 110

Safetensors

Model size

8B params

Tensor type

F16

Model tree for BinSaqban/Averroes-Q

Base model

mlx-community/Qwen2.5-7B-4bit

Finetuned

(1)

this model