chest2vec_0.6B

Chest-radiology text embedding model: Qwen/Qwen3-Embedding-0.6B contrastively LoRA-adapted for chest CT / CXR report retrieval. Embedding = left-padding-aware last-token (EOS) pooling + L2-norm. Embedding dim: 1024.

Self-contained `AutoModel`

The LoRA adapter is merged into the weights (model.safetensors) and the tokenizer is bundled, so loading needs no chest2vec package and no download of the base Qwen3-Embedding weights:

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("chest2vec/chest2vec_0.6B", trust_remote_code=True).eval()
tok   = AutoTokenizer.from_pretrained("chest2vec/chest2vec_0.6B", trust_remote_code=True)

docs = ["Bibasilar atelectasis with small bilateral pleural effusions. Cardiomegaly."]
doc_emb = model.embed_texts(docs, tokenizer=tok)                      # [N, 1024], L2-normalized

# instruction-conditioned query
q_emb = model.embed_instruction_query(
    "Retrieve the chest CT report that is similar to the given report.",
    ["pleural effusion and cardiomegaly"], tokenizer=tok)
vals, idx = model.cosine_topk(q_emb, doc_emb, k=5)

Matryoshka embeddings

Matryoshka (MRL)-trained — truncate to 512 or 256 dims (keep first N dims, re-normalize):

emb512 = model.embed_texts(docs, tokenizer=tok, dim=512)
emb256 = model.embed_texts(docs, tokenizer=tok, dim=256)

Recommended dims: 1024 (full) · 512 · 256 (config.matryoshka_dims). Use the same dim for query and corpus.

Recommended instructions

Instruction-conditioned (Instruct: {instruction}\nQuery: {report}). Apply to the query side; embed the corpus without an instruction. Trained on chest CT and CXR across these families:

Retrieval — Retrieve the chest CT report that is similar to the given report. · Retrieve the CXR report that is similar to the given report. · Retrieve the CXR report that is similar to the given report with prior reference omitted.

Summarization — Summarize the following chest CT report · Summarize the following CXR report · Summarize the given report.

Entity extraction (leaf) — Given the following chest CT report, extract the presence/absence of entities · Given the following CXR report, extract the presence/absence of entities

Entity extraction (upper/coarse) — Given the following chest CT report, extract the presence/absence of upper-level entities · Given the following CXR report, extract the presence/absence of upper class entities

Anatomy-specific — From the following chest {CT report | X-ray report}, extract and return only the findings related to {REGION}, ignoring all information about other structures.

CT regions: lungs · airways and trachea · pleura · mediastinum and hilum · cardiovascular system · chest wall · bones and spine · upper abdomen · lower neck
CXR regions: lungs and airways · pleura · hila and mediastinum · cardiovascular system · musculoskeletal structures and chest wall · tubes, catheters, and support devices · abdomen

Details

Base: Qwen/Qwen3-Embedding-0.6B (Apache-2.0) — architecture rebuilt from the bundled config; merged weights loaded from this repo. Default attention sdpa (use flash_attention_2 on Ampere+ for speed).
Merged weights reproduce the original adapter-based embeddings to cosine ≥ 0.999.

Downloads last month: 104

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for chest2vec/chest2vec_0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B