OpenShape Benchmark Contamination Artifacts (BMVC 2026, Anonymous Release)

Released anonymously for BMVC 2026 double-blind review. Authorship and provenance will be revealed on acceptance.

This release accompanies a BMVC 2026 submission analyzing benchmark contamination in 3D representation learning. It contains the training prune masks, training/eval configs, eval-time per-step metrics, training logs, NN-proxy predictions, and the best-LVIS checkpoint for the counterfactual training runs used to support the paper's headline claims.

What's in this release

This is split across two anonymous repos:

Repo Contents
datasets/<anon-org>/openshape-contamination-axes-data masks, configs, metrics, training logs, training code, eval splits, NN-proxy predictions
<anon-org>/openshape-contamination-axes-checkpoints best_lvis.pt for 5 counterfactual training runs

File map (paper claim → file)

The paper analyzes how four kinds of training-set "leakage" affect downstream LVIS zero-shot top-1 accuracy. Each run trains an OpenShape PointBERT encoder against a different prune mask of the training corpus, then evaluates on the same held-out LVIS split. The "T1" stack is single-stack (PointBERT-only); "T2-MLP" adds a small text-tower MLP, matching the paper's primary three-stack reported numbers.

Paper claim Run Files in this release
C43 Δ_total = 14.27 pp full→pruned, headline result Run-0 vs Run-A vs Run-C (T2-MLP) metrics/run_0_t2_mlp_metrics.jsonl, metrics/run_c_v3_t2_mlp_metrics.jsonl, checkpoints/run_0_t2_mlp_best_lvis.pt, checkpoints/run_c_v3_t2_mlp_best_lvis.pt, masks/run_0.npy, masks/run_C.npy
Run-A (52% self-class prune) Run-A T2-MLP mask + log + config bundled; T2-MLP checkpoint not available for this release (see "What's NOT here")
Run-B v2 reframed (2.7% per-item-impact prune) Run-B v2 T2-MLP checkpoints/run_b_v2_t2_mlp_best_lvis.pt, metrics/run_b_v2_t2_mlp_metrics.jsonl, masks/run_B.npy, configs/run_b_v2_t2_mlp_config.yaml
Run-C T2-MLP v3 best 45.97 LVIS top-1 Run-C v3 T2-MLP checkpoints/run_c_v3_t2_mlp_best_lvis.pt, metrics/run_c_v3_t2_mlp_metrics.jsonl, configs/run_c_v3_t2_mlp_config.yaml
Run-NN matched-volume NN-axis ablation Run-NN T1 + T2-MLP checkpoints/run_nn_t1_best_lvis.pt, checkpoints/run_nn_t2_mlp_best_lvis.pt, metrics + configs
Run-0 three-stack baseline Run-0 T2-MLP checkpoints/run_0_t2_mlp_best_lvis.pt, metrics/run_0_t2_mlp_metrics.jsonl, configs/run_0_t2_mlp_config.yaml
Own-caption proxy (Table 1) NN-proxy LVIS assignments predictions/lvis_nn_assignments.parquet
Captioner sweep (G4_C, supplementary) NN-proxy under 4 caption corpora predictions/captioner_sweep_per_object.parquet
Uni3D replication gap decomposition per-object preds + decomposition predictions/uni3d_per_object_preds.parquet, predictions/uni3d_gap_decomp.json
LVIS eval split held-out LVIS objects (UID → class) splits/lvis_eval.json

Full per-row mapping is in CLAIM_TO_FILE_MAP.md (copied from the paper supplementary).

Repo layout

masks/                      4 × .npy boolean prune masks (one per run)
configs/                    Training YAML configs (one per checkpoint)
metrics/                    Per-step train + per-epoch eval JSONL logs
checkpoints/                best_lvis.pt for 5 runs (Model repo only)
code/                       Training launcher + losses + mask-build scripts
predictions/                NN-proxy per-UID predictions + Uni3D gap decomposition
splits/                     LVIS eval split + ModelNet40 test split
logs/                       Raw training logs (.log) for all 11 train runs
README.md                   This file
CLAIM_TO_FILE_MAP.md        Per-paper-claim file mapping
REPRODUCE.md                Runbook for reproducing every numeric claim
CITATIONS.md                External data dependencies (Cap3D, etc.)
LICENSE                     CC-BY-4.0

What's NOT here

  • Run-A T1, Run-A T2-MLP, Run-B v2 T1, Run-C T1, Run-0 T1 checkpoints — these training runs were on temporary pod scratch storage that was reclaimed before snapshot. Per-step metrics, training logs, masks, and configs ARE in this release for all of them, so the runs are fully reproducible from the bundled code (see REPRODUCE.md).
  • OpenShape training corpus point clouds — not redistributed (~700 GB); download from the original OpenShape release per CITATIONS.md.
  • Cap3D caption corpora — required for the own-caption proxy (Table 1); download per CITATIONS.md. Reproducible aggregates for the captioner-sweep rows are bundled at data/audit/G6_captioner_sweep_own.md (in the supplementary ZIP, not this HF repo).

How to load

from huggingface_hub import snapshot_download
import torch, numpy as np, json

# Data repo
data_path = snapshot_download(
    repo_id="<anon-org>/openshape-contamination-axes-data",
    repo_type="dataset",
)
mask = np.load(f"{data_path}/masks/run_C.npy")
config = open(f"{data_path}/configs/run_c_v3_t2_mlp_config.yaml").read()

# Model repo
ckpt_path = snapshot_download(
    repo_id="<anon-org>/openshape-contamination-axes-checkpoints",
)
state = torch.load(f"{ckpt_path}/run_c_v3_t2_mlp_best_lvis.pt", map_location="cpu")
# State dict keys follow the standard OpenShape PointBERT layout

License

Citation

To be added after the double-blind review period.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support