CosDoc — Attention nq=1 (baseline)
CosDoc is a visual document embedding model trained with supervised metric learning and hard-example selection via a Reinforcement Learning professor network.
Pooler variant: Attention nq=1 (baseline) — Standard attention pooler, num_queries=1.
Architecture
| Component | Value |
|---|---|
| Backbone | InternVL3-2B (OpenGVLab/InternVL3-2B) |
| Cut layer | 27 |
| Pooler | attention (num_queries=1) |
| Embedding dim | 1536 |
| Loss | Sub-Center CosFace (m=0.35, s=32, k=3) |
| Embedding prompt | <image> Analyze this document |
Performance (LA-CDIP, full validation pairs)
| Dataset | EER |
|---|---|
| LA-CDIP (5-fold CV) | 1.44% |
Source run: Sprint3b_S0_subcenter_cosface_seed42_s32k3_fase1_E10
Usage
import torch
from huggingface_hub import hf_hub_download
from cavl_doc.models.backbone_loader import load_model
from cavl_doc.models.modeling_cavl import build_cavl_model
device = "cuda" if torch.cuda.is_available() else "cpu"
# Download fine-tuned weights
ckpt_path = hf_hub_download(repo_id="Jpcosta90/cosdoc", filename="best_model.pt")
ckpt = torch.load(ckpt_path, map_location=device, weights_only=False)
cfg = ckpt["config"]
backbone, _, tokenizer, _, _ = load_model("InternVL3-2B")
model = build_cavl_model(
backbone=backbone,
cut_layer=cfg["cut_layer"],
pooler_type=cfg["pooler_type"],
num_queries=cfg.get("num_queries", 1),
)
model.pool.load_state_dict(ckpt["siam_pool"])
model.head.load_state_dict(ckpt["siam_head"])
model.eval().to(device)
Citation
@misc{cosdoc2026,
title = {CosDoc: Cosine-Margin Document Embeddings with RL-guided Hard Mining},
author = {Costa, João Paulo},
year = {2026},
url = {https://huggingface.co/Jpcosta90/cosdoc}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Evaluation results
- eer on LA-CDIPself-reported0.014