OpenShape Benchmark Contamination Artifacts (BMVC 2026, Anonymous Release)

Released anonymously for BMVC 2026 double-blind review. Authorship and provenance will be revealed on acceptance.

This release accompanies a BMVC 2026 submission analyzing benchmark contamination in 3D representation learning. It contains the training prune masks, training/eval configs, eval-time per-step metrics, training logs, NN-proxy predictions, and the best-LVIS checkpoint for the counterfactual training runs used to support the paper's headline claims.

What's in this release

This is split across two anonymous repos:

Repo	Contents
`datasets/<anon-org>/openshape-contamination-axes-data`	masks, configs, metrics, training logs, training code, eval splits, NN-proxy predictions
`<anon-org>/openshape-contamination-axes-checkpoints`	best_lvis.pt for 5 counterfactual training runs

File map (paper claim → file)

The paper analyzes how four kinds of training-set "leakage" affect downstream LVIS zero-shot top-1 accuracy. Each run trains an OpenShape PointBERT encoder against a different prune mask of the training corpus, then evaluates on the same held-out LVIS split. The "T1" stack is single-stack (PointBERT-only); "T2-MLP" adds a small text-tower MLP, matching the paper's primary three-stack reported numbers.

Paper claim	Run	Files in this release
C43 Δ_total = 14.27 pp full→pruned, headline result	Run-0 vs Run-A vs Run-C (T2-MLP)	`metrics/run_0_t2_mlp_metrics.jsonl`, `metrics/run_c_v3_t2_mlp_metrics.jsonl`, `checkpoints/run_0_t2_mlp_best_lvis.pt`, `checkpoints/run_c_v3_t2_mlp_best_lvis.pt`, `masks/run_0.npy`, `masks/run_C.npy`
Run-A (52% self-class prune)	Run-A T2-MLP	mask + log + config bundled; T2-MLP checkpoint not available for this release (see "What's NOT here")
Run-B v2 reframed (2.7% per-item-impact prune)	Run-B v2 T2-MLP	`checkpoints/run_b_v2_t2_mlp_best_lvis.pt`, `metrics/run_b_v2_t2_mlp_metrics.jsonl`, `masks/run_B.npy`, `configs/run_b_v2_t2_mlp_config.yaml`
Run-C T2-MLP v3 best 45.97 LVIS top-1	Run-C v3 T2-MLP	`checkpoints/run_c_v3_t2_mlp_best_lvis.pt`, `metrics/run_c_v3_t2_mlp_metrics.jsonl`, `configs/run_c_v3_t2_mlp_config.yaml`
Run-NN matched-volume NN-axis ablation	Run-NN T1 + T2-MLP	`checkpoints/run_nn_t1_best_lvis.pt`, `checkpoints/run_nn_t2_mlp_best_lvis.pt`, metrics + configs
Run-0 three-stack baseline	Run-0 T2-MLP	`checkpoints/run_0_t2_mlp_best_lvis.pt`, `metrics/run_0_t2_mlp_metrics.jsonl`, `configs/run_0_t2_mlp_config.yaml`
Own-caption proxy (Table 1)	NN-proxy LVIS assignments	`predictions/lvis_nn_assignments.parquet`
Captioner sweep (G4_C, supplementary)	NN-proxy under 4 caption corpora	`predictions/captioner_sweep_per_object.parquet`
Uni3D replication gap decomposition	per-object preds + decomposition	`predictions/uni3d_per_object_preds.parquet`, `predictions/uni3d_gap_decomp.json`
LVIS eval split	held-out LVIS objects (UID → class)	`splits/lvis_eval.json`

Full per-row mapping is in CLAIM_TO_FILE_MAP.md (copied from the paper supplementary).

Repo layout

masks/                      4 × .npy boolean prune masks (one per run)
configs/                    Training YAML configs (one per checkpoint)
metrics/                    Per-step train + per-epoch eval JSONL logs
checkpoints/                best_lvis.pt for 5 runs (Model repo only)
code/                       Training launcher + losses + mask-build scripts
predictions/                NN-proxy per-UID predictions + Uni3D gap decomposition
splits/                     LVIS eval split + ModelNet40 test split
logs/                       Raw training logs (.log) for all 11 train runs
README.md                   This file
CLAIM_TO_FILE_MAP.md        Per-paper-claim file mapping
REPRODUCE.md                Runbook for reproducing every numeric claim
CITATIONS.md                External data dependencies (Cap3D, etc.)
LICENSE                     CC-BY-4.0

What's NOT here

Run-A T1, Run-A T2-MLP, Run-B v2 T1, Run-C T1, Run-0 T1 checkpoints — these training runs were on temporary pod scratch storage that was reclaimed before snapshot. Per-step metrics, training logs, masks, and configs ARE in this release for all of them, so the runs are fully reproducible from the bundled code (see REPRODUCE.md).
OpenShape training corpus point clouds — not redistributed (~700 GB); download from the original OpenShape release per CITATIONS.md.
Cap3D caption corpora — required for the own-caption proxy (Table 1); download per CITATIONS.md. Reproducible aggregates for the captioner-sweep rows are bundled at data/audit/G6_captioner_sweep_own.md (in the supplementary ZIP, not this HF repo).

How to load

from huggingface_hub import snapshot_download
import torch, numpy as np, json

# Data repo
data_path = snapshot_download(
    repo_id="<anon-org>/openshape-contamination-axes-data",
    repo_type="dataset",
)
mask = np.load(f"{data_path}/masks/run_C.npy")
config = open(f"{data_path}/configs/run_c_v3_t2_mlp_config.yaml").read()

# Model repo
ckpt_path = snapshot_download(
    repo_id="<anon-org>/openshape-contamination-axes-checkpoints",
)
state = torch.load(f"{ckpt_path}/run_c_v3_t2_mlp_best_lvis.pt", map_location="cpu")
# State dict keys follow the standard OpenShape PointBERT layout

License

Code, masks, configs, metrics, logs, predictions, splits: CC-BY-4.0.
Checkpoints: released under the same license as the upstream OpenShape weights (MIT — see https://github.com/Colin97/OpenShape_code/blob/master/LICENSE).

Citation

To be added after the double-blind review period.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support