OpenShape Benchmark Contamination Artifacts (BMVC 2026, Anonymous Release)
Released anonymously for BMVC 2026 double-blind review. Authorship and provenance will be revealed on acceptance.
This release accompanies a BMVC 2026 submission analyzing benchmark contamination in 3D representation learning. It contains the training prune masks, training/eval configs, eval-time per-step metrics, training logs, NN-proxy predictions, and the best-LVIS checkpoint for the counterfactual training runs used to support the paper's headline claims.
What's in this release
This is split across two anonymous repos:
| Repo | Contents |
|---|---|
datasets/<anon-org>/openshape-contamination-axes-data |
masks, configs, metrics, training logs, training code, eval splits, NN-proxy predictions |
<anon-org>/openshape-contamination-axes-checkpoints |
best_lvis.pt for 5 counterfactual training runs |
File map (paper claim → file)
The paper analyzes how four kinds of training-set "leakage" affect downstream LVIS zero-shot top-1 accuracy. Each run trains an OpenShape PointBERT encoder against a different prune mask of the training corpus, then evaluates on the same held-out LVIS split. The "T1" stack is single-stack (PointBERT-only); "T2-MLP" adds a small text-tower MLP, matching the paper's primary three-stack reported numbers.
| Paper claim | Run | Files in this release |
|---|---|---|
| C43 Δ_total = 14.27 pp full→pruned, headline result | Run-0 vs Run-A vs Run-C (T2-MLP) | metrics/run_0_t2_mlp_metrics.jsonl, metrics/run_c_v3_t2_mlp_metrics.jsonl, checkpoints/run_0_t2_mlp_best_lvis.pt, checkpoints/run_c_v3_t2_mlp_best_lvis.pt, masks/run_0.npy, masks/run_C.npy |
| Run-A (52% self-class prune) | Run-A T2-MLP | mask + log + config bundled; T2-MLP checkpoint not available for this release (see "What's NOT here") |
| Run-B v2 reframed (2.7% per-item-impact prune) | Run-B v2 T2-MLP | checkpoints/run_b_v2_t2_mlp_best_lvis.pt, metrics/run_b_v2_t2_mlp_metrics.jsonl, masks/run_B.npy, configs/run_b_v2_t2_mlp_config.yaml |
| Run-C T2-MLP v3 best 45.97 LVIS top-1 | Run-C v3 T2-MLP | checkpoints/run_c_v3_t2_mlp_best_lvis.pt, metrics/run_c_v3_t2_mlp_metrics.jsonl, configs/run_c_v3_t2_mlp_config.yaml |
| Run-NN matched-volume NN-axis ablation | Run-NN T1 + T2-MLP | checkpoints/run_nn_t1_best_lvis.pt, checkpoints/run_nn_t2_mlp_best_lvis.pt, metrics + configs |
| Run-0 three-stack baseline | Run-0 T2-MLP | checkpoints/run_0_t2_mlp_best_lvis.pt, metrics/run_0_t2_mlp_metrics.jsonl, configs/run_0_t2_mlp_config.yaml |
| Own-caption proxy (Table 1) | NN-proxy LVIS assignments | predictions/lvis_nn_assignments.parquet |
| Captioner sweep (G4_C, supplementary) | NN-proxy under 4 caption corpora | predictions/captioner_sweep_per_object.parquet |
| Uni3D replication gap decomposition | per-object preds + decomposition | predictions/uni3d_per_object_preds.parquet, predictions/uni3d_gap_decomp.json |
| LVIS eval split | held-out LVIS objects (UID → class) | splits/lvis_eval.json |
Full per-row mapping is in CLAIM_TO_FILE_MAP.md (copied from the paper supplementary).
Repo layout
masks/ 4 × .npy boolean prune masks (one per run)
configs/ Training YAML configs (one per checkpoint)
metrics/ Per-step train + per-epoch eval JSONL logs
checkpoints/ best_lvis.pt for 5 runs (Model repo only)
code/ Training launcher + losses + mask-build scripts
predictions/ NN-proxy per-UID predictions + Uni3D gap decomposition
splits/ LVIS eval split + ModelNet40 test split
logs/ Raw training logs (.log) for all 11 train runs
README.md This file
CLAIM_TO_FILE_MAP.md Per-paper-claim file mapping
REPRODUCE.md Runbook for reproducing every numeric claim
CITATIONS.md External data dependencies (Cap3D, etc.)
LICENSE CC-BY-4.0
What's NOT here
- Run-A T1, Run-A T2-MLP, Run-B v2 T1, Run-C T1, Run-0 T1 checkpoints —
these training runs were on temporary pod scratch storage that was reclaimed
before snapshot. Per-step metrics, training logs, masks, and configs ARE in
this release for all of them, so the runs are fully reproducible from the
bundled code (see
REPRODUCE.md). - OpenShape training corpus point clouds — not redistributed (~700 GB);
download from the original OpenShape release per
CITATIONS.md. - Cap3D caption corpora — required for the own-caption proxy (Table 1);
download per
CITATIONS.md. Reproducible aggregates for the captioner-sweep rows are bundled atdata/audit/G6_captioner_sweep_own.md(in the supplementary ZIP, not this HF repo).
How to load
from huggingface_hub import snapshot_download
import torch, numpy as np, json
# Data repo
data_path = snapshot_download(
repo_id="<anon-org>/openshape-contamination-axes-data",
repo_type="dataset",
)
mask = np.load(f"{data_path}/masks/run_C.npy")
config = open(f"{data_path}/configs/run_c_v3_t2_mlp_config.yaml").read()
# Model repo
ckpt_path = snapshot_download(
repo_id="<anon-org>/openshape-contamination-axes-checkpoints",
)
state = torch.load(f"{ckpt_path}/run_c_v3_t2_mlp_best_lvis.pt", map_location="cpu")
# State dict keys follow the standard OpenShape PointBERT layout
License
- Code, masks, configs, metrics, logs, predictions, splits: CC-BY-4.0.
- Checkpoints: released under the same license as the upstream OpenShape weights (MIT — see https://github.com/Colin97/OpenShape_code/blob/master/LICENSE).
Citation
To be added after the double-blind review period.