- PaperLens-7B-Vision-OpenReview-ICLR
PaperLens-7B-Vision-OpenReview-ICLR
SFT fine-tune of Qwen/Qwen2.5-VL-7B-Instruct trained to predict ICLR-style Accept/Reject verdicts on academic papers from the paper text plus per-page screenshots.
- Modality: text + per-page image screenshots
- Training data: iclr-21k (85/5/10 balanced split)
- Checkpoint: step 2648, end of epoch 2 of 4 (7B β ep2 by convention)
- Hyperparams: LR 1e-6, cosine_then_constant scheduler (decay_ratio 0.75, min_lr_rate 0.001), batch size 16, cutoff_len 24480, framework LLaMA-Factory + FSDP2.
Quickstart β serve + submit a PDF or LaTeX source dir
Easiest path uses the PaperLens orchestrator (paperprep + scoring server + web UI all wired up). Clone, run setup, and chain into the UI:
git clone https://github.com/zlab-princeton/PaperLens.git
cd PaperLens
uv tool install . # installs the `paperlens` CLI globally on PATH
paperlens setup --serve # in the wizard, pick: size=7B Β· modality=vision Β· domain=iclr
# β web UI on http://localhost:8003 (PDF upload + LaTeX dir browse)
Or hit the API directly (FastAPI on the same port):
# Submit an anonymized PDF; poll for the verdict
JOB=$(curl -s -F file=@anonymized.pdf http://localhost:8003/submit | jq -r .job_id)
curl http://localhost:8003/status/$JOB # β job dict: state, verdict, p_accept, ...
# Submit a LaTeX source directory (anonymized) or an arXiv id
curl -X POST http://localhost:8003/submit_latex \
-H "Content-Type: application/json" \
-d '{"path": "/abs/path/to/anonymized_latex_dir"}'
curl -X POST http://localhost:8003/submit_arxiv \
-H "Content-Type: application/json" \
-d '{"arxiv_id": "2511.08364"}'
Headless one-shot (no server):
paperlens run /abs/path/to/anonymized.pdf
Lower-level: stand up just a vLLM scoring server with pre-prep'd sharegpt rows (skips paperprep):
vllm serve skonan/PaperLens-7B-Vision-OpenReview-ICLR --task generate --gpu-memory-utilization 0.85
# OpenAI-compat API on :8000 β format prompts per the "Prompt format" section below.
Test results (in-distribution, calibrated)
Evaluated on iclr-balanced-test. Calibration threshold picked on iclr-balanced-val. Score = logprob(Accept) β logprob(Reject) at the decision-token position. pA = predicted accept rate; A_rec / R_rec = accept / reject recall.
| n_test | Acc | AUC | pA | A_rec | R_rec |
|---|---|---|---|---|---|
| 2498 | 68.4% | 0.663 | 53% | 71.0% | 65.7% |
Note on training-size asymmetry
ICLR-trained models saw ~21k examples (4 epochs β 2644 / 5296 steps). Arxiv-trained vs ICLR-trained models saw a ~3Γ data gap β direct comparisons should account for it.
Prompt format
Inputs are sharegpt-style 3-turn conversations: system, human, gpt. SYSTEM is the same string across all 8 PaperLens models. USER preamble differs per training domain. Vision variants append one <image> token per page-screenshot at the end of the user message.
SYSTEM (all PaperLens models)
You are an expert academic reviewer tasked with evaluating research papers.
ICLR-trained USER preamble (verbatim)
I am giving you a paper. I want to predict its acceptance outcome at ICLR.
- Your answer will either be: \boxed{Accept} or \boxed{Reject}
- Note: ICLR generally has a ~30% acceptance rate
# <PAPER TITLE>
...paper body in markdown...
Vision variant β append <image> tokens
After the markdown body, append a space-separated run of <image> placeholders, one per page screenshot (typically 7β9 for arxiv, 8β10 for iclr). The images field on the inference request is a parallel list of PNG paths.
...end of markdown body...
<image> <image> <image> <image> <image> <image> <image>
ASSISTANT (gold)
Outcome: \boxed{Accept}
or
Outcome: \boxed{Reject}
At inference, the decision logprobs at the boxed-token position (5th generated token under the qwen template) are used for calibration; either parse the text or read logprobs directly.
Concrete example (TEXT, ARXIV-trained)
[SYSTEM]
You are an expert academic reviewer tasked with evaluating research papers.
[USER]
I am giving you a paper submitted to a top machine-learning venue. Predict its acceptance outcome.
- Your answer will either be: \boxed{Accept} or \boxed{Reject}
- Note: typical top-tier ML venues have ~25-30% acceptance rates
# SSAST: SELF-SUPERVISED AUDIO SPECTROGRAM TRANSFORMER
## Abstract
... ~32k chars of paper body ...
[ASSISTANT]
Outcome: \boxed{Accept}
Concrete example (VISION, ICLR-trained)
[SYSTEM]
You are an expert academic reviewer tasked with evaluating research papers.
[USER]
I am giving you a paper. I want to predict its acceptance outcome at ICLR.
- Your answer will either be: \boxed{Accept} or \boxed{Reject}
- Note: ICLR generally has a ~30% acceptance rate
# ROBUST TRAINING WITH ENSEMBLE CONSENSUS
## Abstract
... ~1k chars of paper body (vision body is much shorter than text body) ...
<image> <image> <image> <image> <image> <image> <image> <image> <image>
[ASSISTANT]
Outcome: \boxed{Reject}
(With images = [page_1.png, page_2.png, β¦, page_9.png] on the request.)
Related models + datasets in the PaperLens collection
All 8 single-domain SFT models (this one plus 7 siblings) plus the companion PaperLens-Text and PaperLens-Vision datasets live in the PaperLens collection. Pairwise comparisons across {3B, 7B} Γ {text, vision} Γ {arxiv, openreview-iclr} are intended.
Model tree for skonan/PaperLens-7B-Vision-OpenReview-ICLR
Base model
Qwen/Qwen2.5-VL-7B-Instruct