CosDoc — Attention nq=1 (baseline)

CosDoc is a visual document embedding model trained with supervised metric learning and hard-example selection via a Reinforcement Learning professor network.

Pooler variant: Attention nq=1 (baseline) — Standard attention pooler, num_queries=1.

Architecture

Component	Value
Backbone	InternVL3-2B (`OpenGVLab/InternVL3-2B`)
Cut layer	27
Pooler	attention (num_queries=1)
Embedding dim	1536
Loss	Sub-Center CosFace (m=0.35, s=32, k=3)
Embedding prompt	`<image> Analyze this document`

Performance (LA-CDIP, full validation pairs)

Dataset	EER
LA-CDIP (5-fold CV)	1.44%

Source run: Sprint3b_S0_subcenter_cosface_seed42_s32k3_fase1_E10

Usage

import torch
from huggingface_hub import hf_hub_download
from cavl_doc.models.backbone_loader import load_model
from cavl_doc.models.modeling_cavl import build_cavl_model

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download fine-tuned weights
ckpt_path = hf_hub_download(repo_id="Jpcosta90/cosdoc", filename="best_model.pt")
ckpt = torch.load(ckpt_path, map_location=device, weights_only=False)
cfg  = ckpt["config"]

backbone, _, tokenizer, _, _ = load_model("InternVL3-2B")
model = build_cavl_model(
    backbone=backbone,
    cut_layer=cfg["cut_layer"],
    pooler_type=cfg["pooler_type"],
    num_queries=cfg.get("num_queries", 1),
)
model.pool.load_state_dict(ckpt["siam_pool"])
model.head.load_state_dict(ckpt["siam_head"])
model.eval().to(device)

Citation

@misc{cosdoc2026,
  title  = {CosDoc: Cosine-Margin Document Embeddings with RL-guided Hard Mining},
  author = {Costa, João Paulo},
  year   = {2026},
  url    = {https://huggingface.co/Jpcosta90/cosdoc}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

eer on LA-CDIP
self-reported

0.014