rudore/colqwen3-2b

LoRA adapter for Russian-language business-document retrieval, trained on top of VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1. The published artifact is the adapter only (~150 MB); load it on the base with peft.PeftModel.from_pretrained.

On the industrial X5-private holdout (Russian business documents, rephrased queries), rudore/colqwen3-2b leads the system ranking at nDCG@5 = 0.8436 and beats the strongest dense-text baseline (pplx-embed-v1-4B at 0.7683) by +0.0753 absolute, despite using half the parameter count.

Training recipe

Loss. ColbertPairwiseCELoss, temperature τ = 0.02, in-batch only (num_negs = 0).
Batch. per_device_train_batch_size = 16, gradient_accumulation_steps = 1.
LoRA. rank r = 32, α = 32, dropout = 0.1, Gaussian init, bias none, targets (q/k/v/o_proj | down/gate/up_proj | custom_text_proj).
Optimizer. paged_adamw_8bit, learning_rate = 5e-5, warmup_steps = 500, weight_decay = 0.0, 1 epoch (~11k optimizer steps).
Sampler. Group-aware sampler, --sampler-seed 42.
Training data. RuDoRe rephrased train split (rudore/RuDoRe publishes the eval split only).
Checkpoint selection. Best by nDCG@5 on a held-out RuDoRe validation shard, then re-scored on the final test set.

Evaluation

Headline metric is nDCG@5. ViDoRe V3 is the macro average over its seven sub-tasks.

Slice	Base	Fine-tuned	Δ
ViDoRe V3 (OOD, 7 tasks, macro)	0.5366	0.5521	+0.0156
RuDoRe (in-domain)	0.8394	0.8979	+0.0585
X5-private (industrial holdout)	0.8208	0.8436	+0.0228

All three Δ values are positive: the LoRA adaptation lifts in-domain quality on both Russian slices and the out-of-domain macro grows rather than regresses (no catastrophic forgetting). Hardware for all eval runs: NVIDIA H200, CUDA 12.8, PyTorch 2.8, FlashAttention 2.8.3.

Usage

from peft import PeftModel
from sauerkrautlm_colpali.models import ColQwen3, ColQwen3Processor

BASE = "VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1"

base = ColQwen3.from_pretrained(
    BASE,
    torch_dtype="bfloat16",
    attn_implementation="flash_attention_2",
)
model = PeftModel.from_pretrained(base, "rudore/colqwen3-2b")
processor = ColQwen3Processor.from_pretrained(BASE)

License

Apache 2.0, inherited from VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1.

Downloads last month: 10

Model tree for rudore/colqwen3-2b

Base model

VAGOsolutions/SauerkrautLM-ColQwen3-2b-v0.1

Adapter

(1)

this model