Instructions to use fredoline005/ajan-embed-q with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use fredoline005/ajan-embed-q with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("fredoline005/ajan-embed-q") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
ajan-embed-q
A Turkish-optimized sentence embedding model for retrieval / RAG, distilled from
BAAI/bge-m3 into a multilingual-e5-base student over 500k Turkish web sentences.
768→1024-dim projected to match the teacher's space.
🔗 Code + recipe: github.com/AJANLAR-AI/ajanlar
Part of Ajanlar — open Turkish AI agent infra. The retrieval engine under the agents, where multilingual models underperform on agglutinative Turkish.
Benchmark (MTEB, Turkish) — with ablation
Main score per task (NDCG@10 retrieval; Spearman STS). multilingual-e5-base is the
undistilled ablation (same base as this model).
| Model | Params | TurHistQuad (retrieval) | STS22.v2 | STS17 | avg |
|---|---|---|---|---|---|
| ajan-embed-q (this) | 278M | 0.465 | 0.651 | 0.724 | 0.613 |
| multilingual-e5-small | 118M | 0.433 | 0.643 | 0.767 | 0.614 |
| multilingual-e5-base (ablation) | 278M | 0.444 | 0.651 | 0.777 | 0.624 |
| multilingual-e5-large | 560M | 0.469 | 0.675 | 0.810 | 0.652 |
| BAAI/bge-m3 (teacher) | 568M | 0.478 | 0.680 | 0.814 | 0.657 |
Honest reading: this is a retrieval-specialised model. On Turkish retrieval (TurHistQuad) it beats e5-small/e5-base and matches e5-large at half the size — the RAG/agent use case it's built for. On general STS it trails the e5 family, so on the 4-task average it lands ~on par with (slightly below) undistilled e5-base. Use it for retrieval/RAG in Turkish, not as a general-purpose STS model.
Usage
from sentence_transformers import SentenceTransformer
m = SentenceTransformer("fredoline005/ajan-embed-q")
emb = m.encode(["query: kargom ne zaman gelir?",
"passage: Siparişler 1–3 iş günü içinde kargoya verilir."],
normalize_embeddings=True)
Use query: / passage: prefixes for retrieval (inherited from the e5 family).
Training
- Method: offline distillation — teacher embeddings cached over the corpus, the student trained (MSE) to reproduce them; a Dense layer projects 768→1024.
- Teacher:
BAAI/bge-m3(MIT). Student:intfloat/multilingual-e5-base(MIT). - Data: 500k Turkish sentences (
allenai/c4,tr). - Hardware: 1× RTX 4090, ~1 hour.
Limitations (honest)
- Benchmarked on 2 Turkish tasks, no confidence intervals — margins are modest.
- The win over e5-small is partly the larger base; an undistilled-e5-base ablation is not yet run.
- Distillation is bounded by the teacher; the native-Turkish-tokenizer edge is v1.
- Pin a corpus snapshot + add PII/dedup filtering for a production retrain.
License
Apache-2.0 (weights/recipe). Base + teacher are MIT.
- Downloads last month
- 72
Model tree for fredoline005/ajan-embed-q
Base model
intfloat/multilingual-e5-base