PaperLens-7B-Vision-OpenReview-ICLR

SFT fine-tune of Qwen/Qwen2.5-VL-7B-Instruct trained to predict ICLR-style Accept/Reject verdicts on academic papers from the paper text plus per-page screenshots.

Modality: text + per-page image screenshots
Training data: iclr-21k (85/5/10 balanced split)
Checkpoint: step 2648, end of epoch 2 of 4 (7B → ep2 by convention)
Hyperparams: LR 1e-6, cosine_then_constant scheduler (decay_ratio 0.75, min_lr_rate 0.001), batch size 16, cutoff_len 24480, framework LLaMA-Factory + FSDP2.

Quickstart — serve + submit a PDF or LaTeX source dir

Easiest path uses the PaperLens orchestrator (paperprep + scoring server + web UI all wired up). Clone, run setup, and chain into the UI:

git clone https://github.com/zlab-princeton/PaperLens.git
cd PaperLens
uv tool install .              # installs the `paperlens` CLI globally on PATH
paperlens setup --serve        # in the wizard, pick: size=7B · modality=vision · domain=iclr
# → web UI on http://localhost:8003 (PDF upload + LaTeX dir browse)

Or hit the API directly (FastAPI on the same port):

# Submit an anonymized PDF; poll for the verdict
JOB=$(curl -s -F file=@anonymized.pdf http://localhost:8003/submit | jq -r .job_id)
curl http://localhost:8003/status/$JOB        # → job dict: state, verdict, p_accept, ...

# Submit a LaTeX source directory (anonymized) or an arXiv id
curl -X POST http://localhost:8003/submit_latex \
     -H "Content-Type: application/json" \
     -d '{"path": "/abs/path/to/anonymized_latex_dir"}'
curl -X POST http://localhost:8003/submit_arxiv \
     -H "Content-Type: application/json" \
     -d '{"arxiv_id": "2511.08364"}'

Headless one-shot (no server):

paperlens run /abs/path/to/anonymized.pdf

Lower-level: stand up just a vLLM scoring server with pre-prep'd sharegpt rows (skips paperprep):

vllm serve skonan/PaperLens-7B-Vision-OpenReview-ICLR --task generate --gpu-memory-utilization 0.85
# OpenAI-compat API on :8000 — format prompts per the "Prompt format" section below.

Test results (in-distribution, calibrated)

Evaluated on iclr-balanced-test. Calibration threshold picked on iclr-balanced-val. Score = logprob(Accept) − logprob(Reject) at the decision-token position. pA = predicted accept rate; A_rec / R_rec = accept / reject recall.

n_test	Acc	AUC	pA	A_rec	R_rec
2498	68.4%	0.663	53%	71.0%	65.7%

Note on training-size asymmetry

ICLR-trained models saw ~21k examples (4 epochs ≈ 2644 / 5296 steps). Arxiv-trained vs ICLR-trained models saw a ~3× data gap — direct comparisons should account for it.

Prompt format

Inputs are sharegpt-style 3-turn conversations: system, human, gpt. SYSTEM is the same string across all 8 PaperLens models. USER preamble differs per training domain. Vision variants append one <image> token per page-screenshot at the end of the user message.

SYSTEM (all PaperLens models)

You are an expert academic reviewer tasked with evaluating research papers.

ICLR-trained USER preamble (verbatim)

I am giving you a paper. I want to predict its acceptance outcome at ICLR.
 - Your answer will either be: \boxed{Accept} or \boxed{Reject}
 - Note: ICLR generally has a ~30% acceptance rate

# <PAPER TITLE>
...paper body in markdown...

Vision variant — append `<image>` tokens

After the markdown body, append a space-separated run of <image> placeholders, one per page screenshot (typically 7–9 for arxiv, 8–10 for iclr). The images field on the inference request is a parallel list of PNG paths.

...end of markdown body...

<image> <image> <image> <image> <image> <image> <image>

ASSISTANT (gold)

Outcome: \boxed{Accept}

Outcome: \boxed{Reject}

At inference, the decision logprobs at the boxed-token position (5th generated token under the qwen template) are used for calibration; either parse the text or read logprobs directly.

Concrete example (TEXT, ARXIV-trained)

[SYSTEM]
You are an expert academic reviewer tasked with evaluating research papers.

[USER]
I am giving you a paper submitted to a top machine-learning venue. Predict its acceptance outcome.
 - Your answer will either be: \boxed{Accept} or \boxed{Reject}
 - Note: typical top-tier ML venues have ~25-30% acceptance rates

# SSAST: SELF-SUPERVISED AUDIO SPECTROGRAM TRANSFORMER

## Abstract
... ~32k chars of paper body ...

[ASSISTANT]
Outcome: \boxed{Accept}

Concrete example (VISION, ICLR-trained)

[SYSTEM]
You are an expert academic reviewer tasked with evaluating research papers.

[USER]
I am giving you a paper. I want to predict its acceptance outcome at ICLR.
 - Your answer will either be: \boxed{Accept} or \boxed{Reject}
 - Note: ICLR generally has a ~30% acceptance rate

# ROBUST TRAINING WITH ENSEMBLE CONSENSUS

## Abstract
... ~1k chars of paper body (vision body is much shorter than text body) ...

<image> <image> <image> <image> <image> <image> <image> <image> <image>

[ASSISTANT]
Outcome: \boxed{Reject}

(With images = [page_1.png, page_2.png, …, page_9.png] on the request.)

Related models + datasets in the PaperLens collection

All 8 single-domain SFT models (this one plus 7 siblings) plus the companion PaperLens-Text and PaperLens-Vision datasets live in the PaperLens collection. Pairwise comparisons across {3B, 7B} × {text, vision} × {arxiv, openreview-iclr} are intended.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skonan/PaperLens-7B-Vision-OpenReview-ICLR

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1104)

this model