rudore/colqwen3-2b

LoRA adapter for Russian-language business-document retrieval, trained on top of VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1. The published artifact is the adapter only (~150 MB); load it on the base with peft.PeftModel.from_pretrained.

On the industrial X5-private holdout (Russian business documents, rephrased queries), rudore/colqwen3-2b leads the system ranking at nDCG@5 = 0.8436 and beats the strongest dense-text baseline (pplx-embed-v1-4B at 0.7683) by +0.0753 absolute, despite using half the parameter count.

Training recipe

  • Loss. ColbertPairwiseCELoss, temperature τ = 0.02, in-batch only (num_negs = 0).
  • Batch. per_device_train_batch_size = 16, gradient_accumulation_steps = 1.
  • LoRA. rank r = 32, α = 32, dropout = 0.1, Gaussian init, bias none, targets (q/k/v/o_proj | down/gate/up_proj | custom_text_proj).
  • Optimizer. paged_adamw_8bit, learning_rate = 5e-5, warmup_steps = 500, weight_decay = 0.0, 1 epoch (~11k optimizer steps).
  • Sampler. Group-aware sampler, --sampler-seed 42.
  • Training data. RuDoRe rephrased train split (rudore/RuDoRe publishes the eval split only).
  • Checkpoint selection. Best by nDCG@5 on a held-out RuDoRe validation shard, then re-scored on the final test set.

Evaluation

Headline metric is nDCG@5. ViDoRe V3 is the macro average over its seven sub-tasks.

Slice Base Fine-tuned Δ
ViDoRe V3 (OOD, 7 tasks, macro) 0.5366 0.5521 +0.0156
RuDoRe (in-domain) 0.8394 0.8979 +0.0585
X5-private (industrial holdout) 0.8208 0.8436 +0.0228

All three Δ values are positive: the LoRA adaptation lifts in-domain quality on both Russian slices and the out-of-domain macro grows rather than regresses (no catastrophic forgetting). Hardware for all eval runs: NVIDIA H200, CUDA 12.8, PyTorch 2.8, FlashAttention 2.8.3.

Usage

from peft import PeftModel
from sauerkrautlm_colpali.models import ColQwen3, ColQwen3Processor

BASE = "VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1"

base = ColQwen3.from_pretrained(
    BASE,
    torch_dtype="bfloat16",
    attn_implementation="flash_attention_2",
)
model = PeftModel.from_pretrained(base, "rudore/colqwen3-2b")
processor = ColQwen3Processor.from_pretrained(BASE)

License

Apache 2.0, inherited from VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rudore/colqwen3-2b

Adapter
(1)
this model