Averroes-Q ๐๏ธ
A 7B bilingual (Arabic/English) language model, LoRA fine-tuned on 1.3M+ curated Arabic documents from classical and modern sources.
Built by Hayula (ููููุฉ) โ Open Arabic AI.
Model Details
| Property | Value |
|---|---|
| Base | Qwen 2.5 7B (4-bit โ fp16) |
| Fine-tune | LoRA (rank=8, scale=20, 16 layers) |
| Training Data | 1.3M+ Arabic documents (8.2 GB) |
| Validation Data | 400K+ Arabic docs (422 MB) |
| Training Steps | 1,000 iterations |
| Context Length | 2,048 tokens |
| Final Val Loss | 1.440 |
| Hardware | Apple M2 Ultra (192 GB) |
| Format | Safetensors (fp16) |
Training Data
The corpus is a curated collection of high-quality Arabic texts sourced from:
- Classical Islamic texts
- Modern Arabic encyclopedias
- Scientific papers and technical documentation
- Arabic literature and journalism
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("BinSaqban/Averroes-Q")
tokenizer = AutoTokenizer.from_pretrained("BinSaqban/Averroes-Q")
prompt = "ู
ุง ูู ุงููุบุฉ ุงูุนุฑุจูุฉุ"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
Limitations
- This is an experimental release. The model inherits Qwen 2.5 architecture and capabilities, fine-tuned for Arabic fluency.
- Training was limited to 1,000 LoRA steps on consumer hardware.
- The model may exhibit biases present in the training data.
About Hayula
Hayula (ููููุฉ = awesome/cool) is an open Arabic AI lab. We build state-of-the-art language models for Arabic and beyond, named after the scholars who shaped civilization.
- Downloads last month
- 110
Model tree for BinSaqban/Averroes-Q
Base model
mlx-community/Qwen2.5-7B-4bit