Credit-assignment artifacts

Weights/vectors, datasets, and per-model results for the narrated credit-assignment study. Code, figures, and the interactive Craftax viz live on GitHub: DavidDemitriAfrica/hackasack. Companion to davidafrica/functional-wellbeing (the welfare axis used here).

artifacts/credit_v1_* — v_credit.pt credit direction per model (9 families, 1.7B–35B), plus analysis.json, steer.json, miscredit.json, repeat_choice.json, align.json, labels.parquet.
artifacts/credit_v2*_*, maze_* — confound (raw-rate-hard) + ecological-maze runs with stats.json.
artifacts/conflict_*, reasoning_*, behavioral_axis_* — the structural-vs-behavioral / framing results (incl. v_struct.npy, v_behav.npy).
artifacts/train_*, rlpay_*, syco_*, craftax_* — SFT/RL training, sycophancy, and Craftax results.
datasets/*.parquet — the generated stimulus sets (credit_v1/v2, maze, craftax).

Raw activations.pt were not retained (large; regenerable from the datasets via credit.extract_credit in the GitHub repo).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

davidafrica
/

credit-assignment

Credit-assignment artifacts

Contents