Credit-assignment artifacts

Weights/vectors, datasets, and per-model results for the narrated credit-assignment study. Code, figures, and the interactive Craftax viz live on GitHub: DavidDemitriAfrica/hackasack. Companion to davidafrica/functional-wellbeing (the welfare axis used here).

Contents

  • artifacts/credit_v1_* โ€” v_credit.pt credit direction per model (9 families, 1.7Bโ€“35B), plus analysis.json, steer.json, miscredit.json, repeat_choice.json, align.json, labels.parquet.
  • artifacts/credit_v2*_*, maze_* โ€” confound (raw-rate-hard) + ecological-maze runs with stats.json.
  • artifacts/conflict_*, reasoning_*, behavioral_axis_* โ€” the structural-vs-behavioral / framing results (incl. v_struct.npy, v_behav.npy).
  • artifacts/train_*, rlpay_*, syco_*, craftax_* โ€” SFT/RL training, sycophancy, and Craftax results.
  • datasets/*.parquet โ€” the generated stimulus sets (credit_v1/v2, maze, craftax).

Raw activations.pt were not retained (large; regenerable from the datasets via credit.extract_credit in the GitHub repo).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support