YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
AdaRAG-CT
Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation (ECCV 2026)
π arXiv Β· π» GitHub Β· π€ Models & Data
Contrastive 3D CT embeddings concentrate 90% of their variance in just 2 of 512 dimensions, and scaling the LLM from 8B to 70B gives no gain β the bottleneck is visual, not generative. AdaRAG-CT compensates by retrieving organ-indexed report sentences and adaptively injecting them during generation, lifting Clinical F1 from 0.420 (CT-Agent) to 0.480.
Results (CT-RATE validation)
| Model | Params | Clin-F1 | BLEU-4 | ROUGE-L | LLaMA |
|---|---|---|---|---|---|
| CT-CHAT (repro.) | 8B | 0.224 | 0.188 | 0.303 | 6.73 |
| CT-CHAT (repro.) | 70B | 0.161 | 0.182 | 0.321 | 6.02 |
| BTB3D | 8B | 0.258 | 0.213 | β | β |
| CT-Agent | β | 0.420 | 0.231 | 0.490 | β |
| Base (ViSD-Boost + CT-CLIP) | 8B | 0.455 | 0.205 | 0.315 | 7.30 |
| AdaRAG-CT | 8B | 0.480 | 0.242 | 0.354 | 7.75 |
| Base (ViSD-Boost + CT-CLIP) | 70B | 0.405 | 0.213 | 0.334 | 7.10 |
| AdaRAG-CT | 70B | 0.426 | 0.250 | 0.361 | 7.53 |
CT-CHAT reproduced under our unified evaluation protocol. Base (ViSD-Boost + CT-CLIP) is our base model: it takes organ-level ViSD-Boost embeddings plus whole-volume CT-CLIP embeddings as visual input.
Install
conda create -n adaragct python=3.12 && conda activate adaragct
pip install -r requirements.txt
Data & Checkpoints
All weights and retrieval data are on π€ HuggingFace. Four checkpoints: Base 8B/70B (full) and AdaRAG-CT 8B/70B (LoRA adapter + projector, load on top of the matching base).
After downloading, place the data/ folder and the checkpoint folders under results/ at the repo root, matching the paths in the commands below β e.g. results/base_8b/checkpoint, results/adaragct_8b/checkpoint_step_2000. (The released predictions/metrics/logs under results/ already come with this repo.)
Evaluate
Score the released predictions directly:
python -m adaragct.eval.cal_metrics results/adaragct_8b/infer_step_2000.jsonl --output metrics.json
Add --compute-llama-score for the LLaMA score, --bootstrap 1000 for 95% confidence intervals.
Or generate predictions, then score:
# AdaRAG-CT (adaptive retrieval)
python -m adaragct.inference.inference_rag --checkpoint results/adaragct_8b/checkpoint_step_2000 --output pred.jsonl
# Base / no-retrieval (same checkpoint, retrieval disabled)
python -m adaragct.inference.inference_rag --checkpoint results/adaragct_8b/checkpoint_step_2000 --no-rag --output base_pred.jsonl
python -m adaragct.eval.cal_metrics pred.jsonl --output metrics.json
Key inference flags: --no-rag (disable retrieval), --text2text (Text2Text retrieval pipeline), --oracle (oracle context), --top-k, --max-retrievals.
On-the-fly retrieval encodes each generated probe sentence with a fine-tuned text encoder (code + indices ship under data/retrieval/; it auto-downloads microsoft/BiomedVLP-CXR-BERT-specialized on first run). For a download-free run, use --oracle (precomputed contexts). To reproduce the exact paper numbers, score the released predictions in results/.
Train
AdaRAG-CT trains a [RAG] trigger token on top of a frozen base model via LoRA, mixing oracle and retrieved contexts. Every required input β base checkpoint, CT embeddings, and the precomputed oracle/retrieval contexts β is provided in the HuggingFace bundle, so training runs directly with no extra preprocessing:
# 8B
python -m adaragct.train.train_rag --config configs/adaragct_8b.yaml
# 70B
python -m adaragct.train.train_rag --config configs/adaragct_70b.yaml
Each config points to data/ (embeddings, oracle_context_top3.jsonl, retrieval_context_top3.jsonl) and a base checkpoint under results/. base_8b.yaml is the 8B base-model reference config.
Citation
@misc{liang2026embeddingbottleneckadaptiveretrievalaugmented,
title={Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation},
author={Renjie Liang and Yiling Ma and Yang Xing and Zhengkang Fan and Jinqian Pan and Chengkun Sun and Li Li and Kuang Gong and Jie Xu},
year={2026},
eprint={2603.15822},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.15822},
}
Acknowledgements
CT-RATE / CT-CLIP Β· ViSD-Boost Β· LLaVA Β· Self-RAG