Averroes-Q ๐Ÿ›๏ธ

A 7B bilingual (Arabic/English) language model, LoRA fine-tuned on 1.3M+ curated Arabic documents from classical and modern sources.

Built by Hayula (ู‡ูŠูˆู„ุฉ) โ€” Open Arabic AI.

Model Details

Property Value
Base Qwen 2.5 7B (4-bit โ†’ fp16)
Fine-tune LoRA (rank=8, scale=20, 16 layers)
Training Data 1.3M+ Arabic documents (8.2 GB)
Validation Data 400K+ Arabic docs (422 MB)
Training Steps 1,000 iterations
Context Length 2,048 tokens
Final Val Loss 1.440
Hardware Apple M2 Ultra (192 GB)
Format Safetensors (fp16)

Training Data

The corpus is a curated collection of high-quality Arabic texts sourced from:

  • Classical Islamic texts
  • Modern Arabic encyclopedias
  • Scientific papers and technical documentation
  • Arabic literature and journalism

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("BinSaqban/Averroes-Q")
tokenizer = AutoTokenizer.from_pretrained("BinSaqban/Averroes-Q")

prompt = "ู…ุง ู‡ูŠ ุงู„ู„ุบุฉ ุงู„ุนุฑุจูŠุฉุŸ"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Limitations

  • This is an experimental release. The model inherits Qwen 2.5 architecture and capabilities, fine-tuned for Arabic fluency.
  • Training was limited to 1,000 LoRA steps on consumer hardware.
  • The model may exhibit biases present in the training data.

About Hayula

Hayula (ู‡ูŠูˆู„ุฉ = awesome/cool) is an open Arabic AI lab. We build state-of-the-art language models for Arabic and beyond, named after the scholars who shaped civilization.

Downloads last month
110
Safetensors
Model size
8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BinSaqban/Averroes-Q

Finetuned
(1)
this model