YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

psn-mechanism-benchmark-small

passes_gate: False (validity=True, confirmatory=False) full_passed: True | credited mechanisms: 2/4 chances: {'state': 0.16666666666666666, 'plan': 0.25, 'task': 0.20833333333333331}

Model comparison (task_acc = mean(state_acc, plan_acc))

model group params carried_floats task_acc state_acc plan_acc
psn candidate 2569365 1728 0.447 0.475 0.418
mlp fair 2657369 64 0.224 0.182 0.266
gru fair 2486297 896 0.332 0.373 0.290
mem_tf fair 2442329 10368 0.267 0.282 0.253
windowed_tf fair 2431961 1024 0.238 0.223 0.253
full_tf ceiling 2431961 3072* 0.213 0.210 0.216

PSN - best fair baseline (gru): 0.115 (margin needed 0.08) ceiling full_tf task_acc: 0.21321614583333334 (reported, NOT required to beat)

Confirmatory checks

  • PASS full_learns_state
  • PASS full_learns_plan
  • PASS full_decisive
  • PASS param_match_fair_baselines
  • PASS beats_fair_baselines_by_margin
  • FAIL enough_mechanisms_bite

Ablation credit (per-mechanism axis)

ablation axis full ablated delta min_delta credited
no_m state_acc 0.475 0.307 0.168 0.05 yes
no_hierarchy state_acc 0.475 0.502 -0.027 0.05 no
no_slow_level state_acc 0.475 0.396 0.079 0.05 yes
no_rollout plan_acc 0.418 0.376 0.042 0.1 no
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support