Instructions to use rudore/colqwen3-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rudore/colqwen3-2b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
rudore/colqwen3-2b
LoRA adapter for Russian-language business-document retrieval,
trained on top of
VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1.
The published artifact is the adapter only (~150 MB); load it on
the base with peft.PeftModel.from_pretrained.
On the industrial X5-private holdout (Russian business documents,
rephrased queries), rudore/colqwen3-2b leads the system ranking at
nDCG@5 = 0.8436 and beats the strongest dense-text baseline
(pplx-embed-v1-4B at 0.7683) by +0.0753 absolute, despite using
half the parameter count.
Training recipe
- Loss.
ColbertPairwiseCELoss, temperature τ = 0.02, in-batch only (num_negs = 0). - Batch.
per_device_train_batch_size = 16,gradient_accumulation_steps = 1. - LoRA. rank
r = 32,α = 32,dropout = 0.1, Gaussian init, bias none, targets(q/k/v/o_proj | down/gate/up_proj | custom_text_proj). - Optimizer.
paged_adamw_8bit,learning_rate = 5e-5,warmup_steps = 500,weight_decay = 0.0, 1 epoch (~11k optimizer steps). - Sampler. Group-aware sampler,
--sampler-seed 42. - Training data. RuDoRe rephrased train split
(
rudore/RuDoRepublishes the eval split only). - Checkpoint selection. Best by nDCG@5 on a held-out RuDoRe validation shard, then re-scored on the final test set.
Evaluation
Headline metric is nDCG@5. ViDoRe V3 is the macro average over its seven sub-tasks.
| Slice | Base | Fine-tuned | Δ |
|---|---|---|---|
| ViDoRe V3 (OOD, 7 tasks, macro) | 0.5366 | 0.5521 | +0.0156 |
| RuDoRe (in-domain) | 0.8394 | 0.8979 | +0.0585 |
| X5-private (industrial holdout) | 0.8208 | 0.8436 | +0.0228 |
All three Δ values are positive: the LoRA adaptation lifts in-domain quality on both Russian slices and the out-of-domain macro grows rather than regresses (no catastrophic forgetting). Hardware for all eval runs: NVIDIA H200, CUDA 12.8, PyTorch 2.8, FlashAttention 2.8.3.
Usage
from peft import PeftModel
from sauerkrautlm_colpali.models import ColQwen3, ColQwen3Processor
BASE = "VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1"
base = ColQwen3.from_pretrained(
BASE,
torch_dtype="bfloat16",
attn_implementation="flash_attention_2",
)
model = PeftModel.from_pretrained(base, "rudore/colqwen3-2b")
processor = ColQwen3Processor.from_pretrained(BASE)
License
Apache 2.0, inherited from
VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1.
- Downloads last month
- 10
Model tree for rudore/colqwen3-2b
Base model
VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1