phanerozoic commited on
Commit
81d1bef
·
verified ·
1 Parent(s): a806d66

Initialize repo with Stage 0 baseline

Browse files
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 1-Parameter Classifier
2
+
3
+ Progressively reducing the model budget for image-level person classification on EUPE-ViT-B features. Each stage is a deeper reduction or transformation of the previous. The classifier shrinks across stages while the backbone it draws features from is attacked in parallel.
4
+
5
+ ## Stage 0: Baseline
6
+
7
+ A 1-free-parameter image-level person classifier on the frozen EUPE-ViT-B backbone. The classifier reads 20 pre-selected person-positive and 20 pre-selected person-negative feature dimensions, sums the positives, subtracts the negatives, and compares the result to one learned threshold. F1 = 0.889 on COCO val 2017 image-level person presence, measured through the live Argus forward pass at 768 pixel input.
8
+
9
+ See [`stage_0/`](stage_0/) for the classifier config, discovery pipeline, and full characterization of the person axis in the backbone.
10
+
11
+ ## Roadmap
12
+
13
+ | Stage | Name | What changes |
14
+ |---|---|---|
15
+ | 0 | Baseline 1-param classifier | Uses the full EUPE-ViT-B backbone unchanged |
16
+ | 1 | Output-channel pruning | Keep only the 100 feature dims the classifier reads |
17
+ | 2 | Attention-head pruning | Ablate heads that do not contribute to those 100 dims |
18
+ | 3 | Depth reduction | Drop transformer blocks that do not route signal to the 100 dims |
19
+ | 4 | Specialist backbone | Train a small student that emits only the 100 target dims |
20
+ | 5 | Circuit-level synthesis | Synthesize the entire fixed-weight pipeline to gates and dead-code eliminate everything that does not reach the classifier output |
21
+
22
+ ## Source backbone
23
+
24
+ EUPE-ViT-B from Meta FAIR ([arXiv:2603.22387](https://arxiv.org/abs/2603.22387), Zhu et al., March 2026), distilled from PEcore-G + PElang-G + DINOv3-H+ via a 1.9B proxy teacher. License: FAIR Research License (non-commercial). The 1-parameter classifier is an artifact derived from that backbone's feature geometry.
stage_0/README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Stage 0: Baseline 1-Parameter Classifier
2
+
3
+ Image-level person classifier on frozen EUPE-ViT-B features. One free scalar parameter.
4
+
5
+ ## Classifier
6
+
7
+ Given a 768 pixel input image, forward through EUPE-ViT-B, take the 2304 patch tokens at the final layer, apply layernorm across the 768-channel axis, and max-pool across patches to get a single 768-D vector per image. The classifier reads 40 of those 768 dimensions: 20 person-positive, 20 person-negative. Sum the positives, subtract the negatives, compare against one learned threshold.
8
+
9
+ ```python
10
+ # pseudocode
11
+ patches = backbone(image)["x_norm_patchtokens"] # (2304, 768)
12
+ pooled = layernorm(patches, 768).max(dim=patches) # (768,)
13
+ score = pooled[pos_dims].sum() - pooled[neg_dims].sum()
14
+ pred = score > threshold # bool
15
+ ```
16
+
17
+ All arrays (`pos_dims`, `neg_dims`, `threshold`) are in `classifier.json`.
18
+
19
+ ## Evaluation
20
+
21
+ F1 = 0.889, precision = 0.901, recall = 0.876 on 5000 COCO val 2017 images, measured through the live Argus forward pass at 768 pixel input.
22
+
23
+ See `eval.json`.
24
+
25
+ ## How the dim selection was discovered
26
+
27
+ Three-step process, documented in the artifacts below.
28
+
29
+ **`cojoint_discovery.json`** — Sampled 100,000 random 92-dim subsets of the 768-D EUPE-ViT-B feature space, trained a ridge classifier for each, kept the top 1%, counted dim occurrence frequency across the kept cohort. Dim 48 appeared in 100% of top-1000 subsets. Next strongest (dim 525) appeared in 31%.
30
+
31
+ **`characterization.json`** — Five analyses on dim 48 specifically. F1 versus K (dim 48 alone reaches F1 = 0.83). Activation distribution for person-positive versus person-negative images (Cohen's d = 1.98). Per-class activation delta for each of 80 COCO categories. Top-10 frequent-dim pairwise correlation (max |r| = 0.57, mostly independent). Spatial IoU of dim-48 peak activations against ground-truth person boxes (mean IoU = 0.17 — dim 48 is a scene-level signal, not a pixel-level localizer).
32
+
33
+ **`compressed_variants.json`** — Leaderboard of 20+ classifier variants ranging from 1 free parameter to 769. Ranked, the ternary ±1 on 50 positive plus 50 negative dims wins at F1 = 0.893. The 20+20 variant chosen for Stage 0 is the same recipe at smaller footprint with F1 = 0.881 cached / 0.889 live.
34
+
35
+ ## Interpretation
36
+
37
+ Dim 48 is the canonical anthropogenic-scene axis in EUPE-ViT-B. It activates strongly on person scenes and on person-associated objects (sports equipment, wearable accessories, handheld items), and is suppressed on non-human animals and non-anthropogenic structures. Alone it delivers F1 = 0.83 as a 2-parameter classifier. The additional 39 dims stack on mostly orthogonal axes to reach F1 = 0.89 at 1 free parameter.
38
+
39
+ ## Hardware footprint estimate
40
+
41
+ At INT8 precision, the classifier synthesizes to an estimated 2,500–4,100 gates: two Wallace-tree adders (50-input each), one subtractor, one comparator. For reference, a 768-dim INT8 MAC unit is roughly 65,000 gates, and the prior 4,614-parameter multi-output person detector synthesizes to roughly 391,000 gates.
stage_0/characterization.json ADDED
@@ -0,0 +1,542 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "target_dim": 48,
3
+ "A_f1_vs_k": [
4
+ {
5
+ "K": 1,
6
+ "F1": 0.8285356163978577,
7
+ "precision": 0.7985524535179138,
8
+ "recall": 0.8608582615852356,
9
+ "dims": [
10
+ 48
11
+ ]
12
+ },
13
+ {
14
+ "K": 2,
15
+ "F1": 0.8562204241752625,
16
+ "precision": 0.8503633737564087,
17
+ "recall": 0.8621586561203003,
18
+ "dims": [
19
+ 48,
20
+ 525
21
+ ]
22
+ },
23
+ {
24
+ "K": 3,
25
+ "F1": 0.8652572631835938,
26
+ "precision": 0.8526936173439026,
27
+ "recall": 0.8781967759132385,
28
+ "dims": [
29
+ 48,
30
+ 525,
31
+ 475
32
+ ]
33
+ },
34
+ {
35
+ "K": 5,
36
+ "F1": 0.8649698495864868,
37
+ "precision": 0.8608844876289368,
38
+ "recall": 0.8690940737724304,
39
+ "dims": [
40
+ 48,
41
+ 525,
42
+ 475,
43
+ 323,
44
+ 240
45
+ ]
46
+ },
47
+ {
48
+ "K": 10,
49
+ "F1": 0.9092138409614563,
50
+ "precision": 0.9465290904045105,
51
+ "recall": 0.8747290968894958,
52
+ "dims": [
53
+ 48,
54
+ 525,
55
+ 475,
56
+ 323,
57
+ 240,
58
+ "..."
59
+ ]
60
+ },
61
+ {
62
+ "K": 20,
63
+ "F1": 0.9288457632064819,
64
+ "precision": 0.9698113203048706,
65
+ "recall": 0.8912007212638855,
66
+ "dims": [
67
+ 48,
68
+ 525,
69
+ 475,
70
+ 323,
71
+ 240,
72
+ "..."
73
+ ]
74
+ },
75
+ {
76
+ "K": 30,
77
+ "F1": 0.9349648356437683,
78
+ "precision": 0.9795821309089661,
79
+ "recall": 0.8942349553108215,
80
+ "dims": [
81
+ 48,
82
+ 525,
83
+ 475,
84
+ 323,
85
+ 240,
86
+ "..."
87
+ ]
88
+ },
89
+ {
90
+ "K": 50,
91
+ "F1": 0.9407240152359009,
92
+ "precision": 0.9839091300964355,
93
+ "recall": 0.9011703729629517,
94
+ "dims": [
95
+ 48,
96
+ 525,
97
+ 475,
98
+ 323,
99
+ 240,
100
+ "..."
101
+ ]
102
+ },
103
+ {
104
+ "K": 92,
105
+ "F1": 0.9444695711135864,
106
+ "precision": 0.98539799451828,
107
+ "recall": 0.9068053960800171,
108
+ "dims": [
109
+ 48,
110
+ 525,
111
+ 475,
112
+ 323,
113
+ 240,
114
+ "..."
115
+ ]
116
+ },
117
+ {
118
+ "K": 184,
119
+ "F1": 0.9485046863555908,
120
+ "precision": 0.9855140447616577,
121
+ "recall": 0.9141742587089539,
122
+ "dims": [
123
+ 48,
124
+ 525,
125
+ 475,
126
+ 323,
127
+ 240,
128
+ "..."
129
+ ]
130
+ }
131
+ ],
132
+ "B_dim48_distribution": {
133
+ "person_pos": {
134
+ "n": 2693,
135
+ "mean": 8.113646507263184,
136
+ "std": 2.3924715518951416,
137
+ "min": -1.2812166213989258,
138
+ "p25": 7.193147659301758,
139
+ "p50": 8.808687210083008,
140
+ "p75": 9.730491638183594,
141
+ "max": 12.061566352844238
142
+ },
143
+ "person_neg": {
144
+ "n": 2307,
145
+ "mean": 2.982386589050293,
146
+ "std": 2.7675836086273193,
147
+ "min": -3.072082042694092,
148
+ "p25": 0.5744761228561401,
149
+ "p50": 3.059739589691162,
150
+ "p75": 5.2233123779296875,
151
+ "max": 11.300838470458984
152
+ },
153
+ "separation_cohen_d": 1.983604907989502
154
+ },
155
+ "C_per_class_top15": [
156
+ {
157
+ "class": "person",
158
+ "cat_id": 1,
159
+ "n_imgs": 2693,
160
+ "dim48_mean_inclass": 8.1136474609375,
161
+ "dim48_mean_outclass": 2.982386350631714,
162
+ "delta": 5.131261110305786
163
+ },
164
+ {
165
+ "class": "tennis racket",
166
+ "cat_id": 43,
167
+ "n_imgs": 167,
168
+ "dim48_mean_inclass": 9.163715362548828,
169
+ "dim48_mean_outclass": 5.627990245819092,
170
+ "delta": 3.5357251167297363
171
+ },
172
+ {
173
+ "class": "tie",
174
+ "cat_id": 32,
175
+ "n_imgs": 145,
176
+ "dim48_mean_inclass": 8.980120658874512,
177
+ "dim48_mean_outclass": 5.6494951248168945,
178
+ "delta": 3.330625534057617
179
+ },
180
+ {
181
+ "class": "baseball glove",
182
+ "cat_id": 40,
183
+ "n_imgs": 100,
184
+ "dim48_mean_inclass": 8.982823371887207,
185
+ "dim48_mean_outclass": 5.680027008056641,
186
+ "delta": 3.3027963638305664
187
+ },
188
+ {
189
+ "class": "skateboard",
190
+ "cat_id": 41,
191
+ "n_imgs": 127,
192
+ "dim48_mean_inclass": 8.923754692077637,
193
+ "dim48_mean_outclass": 5.663267135620117,
194
+ "delta": 3.2604875564575195
195
+ },
196
+ {
197
+ "class": "baseball bat",
198
+ "cat_id": 39,
199
+ "n_imgs": 97,
200
+ "dim48_mean_inclass": 8.886703491210938,
201
+ "dim48_mean_outclass": 5.683949947357178,
202
+ "delta": 3.2027535438537598
203
+ },
204
+ {
205
+ "class": "backpack",
206
+ "cat_id": 27,
207
+ "n_imgs": 228,
208
+ "dim48_mean_inclass": 8.655838012695312,
209
+ "dim48_mean_outclass": 5.607059001922607,
210
+ "delta": 3.048779010772705
211
+ },
212
+ {
213
+ "class": "remote",
214
+ "cat_id": 75,
215
+ "n_imgs": 145,
216
+ "dim48_mean_inclass": 8.703699111938477,
217
+ "dim48_mean_outclass": 5.657751083374023,
218
+ "delta": 3.045948028564453
219
+ },
220
+ {
221
+ "class": "toothbrush",
222
+ "cat_id": 90,
223
+ "n_imgs": 34,
224
+ "dim48_mean_inclass": 8.746152877807617,
225
+ "dim48_mean_outclass": 5.725543022155762,
226
+ "delta": 3.0206098556518555
227
+ },
228
+ {
229
+ "class": "handbag",
230
+ "cat_id": 31,
231
+ "n_imgs": 292,
232
+ "dim48_mean_inclass": 8.576542854309082,
233
+ "dim48_mean_outclass": 5.570532321929932,
234
+ "delta": 3.0060105323791504
235
+ },
236
+ {
237
+ "class": "sports ball",
238
+ "cat_id": 37,
239
+ "n_imgs": 169,
240
+ "dim48_mean_inclass": 8.645074844360352,
241
+ "dim48_mean_outclass": 5.644669532775879,
242
+ "delta": 3.0004053115844727
243
+ },
244
+ {
245
+ "class": "hair drier",
246
+ "cat_id": 89,
247
+ "n_imgs": 9,
248
+ "dim48_mean_inclass": 8.637924194335938,
249
+ "dim48_mean_outclass": 5.74086856842041,
250
+ "delta": 2.8970556259155273
251
+ },
252
+ {
253
+ "class": "skis",
254
+ "cat_id": 35,
255
+ "n_imgs": 120,
256
+ "dim48_mean_inclass": 8.562700271606445,
257
+ "dim48_mean_outclass": 5.676822185516357,
258
+ "delta": 2.885878086090088
259
+ },
260
+ {
261
+ "class": "cell phone",
262
+ "cat_id": 77,
263
+ "n_imgs": 214,
264
+ "dim48_mean_inclass": 8.483532905578613,
265
+ "dim48_mean_outclass": 5.623681545257568,
266
+ "delta": 2.859851360321045
267
+ },
268
+ {
269
+ "class": "snowboard",
270
+ "cat_id": 36,
271
+ "n_imgs": 49,
272
+ "dim48_mean_inclass": 8.451048851013184,
273
+ "dim48_mean_outclass": 5.71931266784668,
274
+ "delta": 2.731736183166504
275
+ }
276
+ ],
277
+ "C_per_class_bottom15": [
278
+ {
279
+ "class": "traffic light",
280
+ "cat_id": 10,
281
+ "n_imgs": 191,
282
+ "dim48_mean_inclass": 4.146858215332031,
283
+ "dim48_mean_outclass": 5.809599876403809,
284
+ "delta": -1.6627416610717773
285
+ },
286
+ {
287
+ "class": "boat",
288
+ "cat_id": 9,
289
+ "n_imgs": 121,
290
+ "dim48_mean_inclass": 4.040976047515869,
291
+ "dim48_mean_outclass": 5.788370132446289,
292
+ "delta": -1.74739408493042
293
+ },
294
+ {
295
+ "class": "fire hydrant",
296
+ "cat_id": 11,
297
+ "n_imgs": 86,
298
+ "dim48_mean_inclass": 3.6824471950531006,
299
+ "dim48_mean_outclass": 5.782199382781982,
300
+ "delta": -2.099752187728882
301
+ },
302
+ {
303
+ "class": "clock",
304
+ "cat_id": 85,
305
+ "n_imgs": 204,
306
+ "dim48_mean_inclass": 3.664764642715454,
307
+ "dim48_mean_outclass": 5.83461332321167,
308
+ "delta": -2.169848680496216
309
+ },
310
+ {
311
+ "class": "train",
312
+ "cat_id": 7,
313
+ "n_imgs": 157,
314
+ "dim48_mean_inclass": 3.4102745056152344,
315
+ "dim48_mean_outclass": 5.821805477142334,
316
+ "delta": -2.4115309715270996
317
+ },
318
+ {
319
+ "class": "sheep",
320
+ "cat_id": 20,
321
+ "n_imgs": 65,
322
+ "dim48_mean_inclass": 3.174736261367798,
323
+ "dim48_mean_outclass": 5.779951095581055,
324
+ "delta": -2.605214834213257
325
+ },
326
+ {
327
+ "class": "elephant",
328
+ "cat_id": 22,
329
+ "n_imgs": 89,
330
+ "dim48_mean_inclass": 3.023428201675415,
331
+ "dim48_mean_outclass": 5.795424938201904,
332
+ "delta": -2.7719967365264893
333
+ },
334
+ {
335
+ "class": "parking meter",
336
+ "cat_id": 14,
337
+ "n_imgs": 37,
338
+ "dim48_mean_inclass": 2.688075065612793,
339
+ "dim48_mean_outclass": 5.768880844116211,
340
+ "delta": -3.080805778503418
341
+ },
342
+ {
343
+ "class": "airplane",
344
+ "cat_id": 5,
345
+ "n_imgs": 97,
346
+ "dim48_mean_inclass": 2.6694958209991455,
347
+ "dim48_mean_outclass": 5.806950092315674,
348
+ "delta": -3.1374542713165283
349
+ },
350
+ {
351
+ "class": "stop sign",
352
+ "cat_id": 13,
353
+ "n_imgs": 69,
354
+ "dim48_mean_inclass": 2.509443998336792,
355
+ "dim48_mean_outclass": 5.7913737297058105,
356
+ "delta": -3.2819297313690186
357
+ },
358
+ {
359
+ "class": "cow",
360
+ "cat_id": 21,
361
+ "n_imgs": 87,
362
+ "dim48_mean_inclass": 2.1573596000671387,
363
+ "dim48_mean_outclass": 5.809632778167725,
364
+ "delta": -3.652273178100586
365
+ },
366
+ {
367
+ "class": "bird",
368
+ "cat_id": 16,
369
+ "n_imgs": 125,
370
+ "dim48_mean_inclass": 1.5200526714324951,
371
+ "dim48_mean_outclass": 5.854443073272705,
372
+ "delta": -4.33439040184021
373
+ },
374
+ {
375
+ "class": "giraffe",
376
+ "cat_id": 25,
377
+ "n_imgs": 101,
378
+ "dim48_mean_inclass": 0.5355944633483887,
379
+ "dim48_mean_outclass": 5.8535051345825195,
380
+ "delta": -5.317910671234131
381
+ },
382
+ {
383
+ "class": "zebra",
384
+ "cat_id": 24,
385
+ "n_imgs": 85,
386
+ "dim48_mean_inclass": 0.19826489686965942,
387
+ "dim48_mean_outclass": 5.842027187347412,
388
+ "delta": -5.643762290477753
389
+ },
390
+ {
391
+ "class": "bear",
392
+ "cat_id": 23,
393
+ "n_imgs": 49,
394
+ "dim48_mean_inclass": -0.3452981412410736,
395
+ "dim48_mean_outclass": 5.806369781494141,
396
+ "delta": -6.151667922735214
397
+ }
398
+ ],
399
+ "D_top10_correlation": {
400
+ "dims": [
401
+ 48,
402
+ 525,
403
+ 475,
404
+ 323,
405
+ 240,
406
+ 637,
407
+ 318,
408
+ 472,
409
+ 719,
410
+ 251
411
+ ],
412
+ "corr_matrix": [
413
+ [
414
+ 1.000000238418579,
415
+ 0.28202927112579346,
416
+ -0.03376583009958267,
417
+ 0.24143743515014648,
418
+ -0.013180982321500778,
419
+ 0.1519388109445572,
420
+ 0.21512284874916077,
421
+ 0.12231583148241043,
422
+ 0.15586860477924347,
423
+ 0.14881354570388794
424
+ ],
425
+ [
426
+ 0.28202927112579346,
427
+ 0.9999998211860657,
428
+ 0.2280241698026657,
429
+ 0.08577030897140503,
430
+ 0.20908223092556,
431
+ -0.09965874999761581,
432
+ 0.2280401587486267,
433
+ 0.08895031362771988,
434
+ 0.2658396065235138,
435
+ -0.15059183537960052
436
+ ],
437
+ [
438
+ -0.03376583009958267,
439
+ 0.2280241698026657,
440
+ 1.0000001192092896,
441
+ 0.061812229454517365,
442
+ 0.570030927658081,
443
+ -0.011771553196012974,
444
+ 0.07696891576051712,
445
+ -0.10601639002561569,
446
+ 0.06801445037126541,
447
+ -0.23264755308628082
448
+ ],
449
+ [
450
+ 0.24143743515014648,
451
+ 0.08577030897140503,
452
+ 0.061812229454517365,
453
+ 0.9999997615814209,
454
+ 0.0030930074863135815,
455
+ 0.18994615972042084,
456
+ 0.15969695150852203,
457
+ 0.21833136677742004,
458
+ 0.26542553305625916,
459
+ -0.039861708879470825
460
+ ],
461
+ [
462
+ -0.013180982321500778,
463
+ 0.20908223092556,
464
+ 0.570030927658081,
465
+ 0.0030930074863135815,
466
+ 1.0000003576278687,
467
+ -0.09009360522031784,
468
+ 0.003698738757520914,
469
+ -0.08297917991876602,
470
+ 0.01635875552892685,
471
+ -0.21284666657447815
472
+ ],
473
+ [
474
+ 0.1519388109445572,
475
+ -0.09965874999761581,
476
+ -0.011771553196012974,
477
+ 0.18994615972042084,
478
+ -0.09009360522031784,
479
+ 0.9999999403953552,
480
+ 0.13532496988773346,
481
+ 0.04327913001179695,
482
+ -0.03058011643588543,
483
+ 0.22757190465927124
484
+ ],
485
+ [
486
+ 0.21512284874916077,
487
+ 0.2280401587486267,
488
+ 0.07696891576051712,
489
+ 0.15969695150852203,
490
+ 0.003698738757520914,
491
+ 0.13532496988773346,
492
+ 0.9999998807907104,
493
+ 0.030232839286327362,
494
+ 0.23005540668964386,
495
+ 0.025300158187747
496
+ ],
497
+ [
498
+ 0.12231583148241043,
499
+ 0.08895031362771988,
500
+ -0.10601639002561569,
501
+ 0.21833136677742004,
502
+ -0.08297917991876602,
503
+ 0.04327913001179695,
504
+ 0.030232839286327362,
505
+ 1.000000238418579,
506
+ 0.21879234910011292,
507
+ 0.020776957273483276
508
+ ],
509
+ [
510
+ 0.15586860477924347,
511
+ 0.2658396065235138,
512
+ 0.06801445037126541,
513
+ 0.26542553305625916,
514
+ 0.01635875552892685,
515
+ -0.03058011643588543,
516
+ 0.23005540668964386,
517
+ 0.21879234910011292,
518
+ 0.9999998807907104,
519
+ -0.015971055254340172
520
+ ],
521
+ [
522
+ 0.14881354570388794,
523
+ -0.15059183537960052,
524
+ -0.23264755308628082,
525
+ -0.039861708879470825,
526
+ -0.21284666657447815,
527
+ 0.22757190465927124,
528
+ 0.025300158187747,
529
+ 0.020776957273483276,
530
+ -0.015971055254340172,
531
+ 1.0000001192092896
532
+ ]
533
+ ],
534
+ "max_abs_offdiag": 0.570030927658081
535
+ },
536
+ "E_spatial_localization": {
537
+ "n_sampled_images": 500,
538
+ "mean_iou": 0.16534487868892028,
539
+ "median_iou": 0.15606729686260223,
540
+ "p95_iou": 0.35281437188386916
541
+ }
542
+ }
stage_0/classifier.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backbone": "facebook/EUPE-ViT-B",
3
+ "feature_dim": 768,
4
+ "input_resolution": 768,
5
+ "patch_size": 16,
6
+ "patch_grid": [48, 48],
7
+ "preprocessing": "layernorm over 768 channels then max-pool over 2304 patches",
8
+ "pos_dims": [48, 525, 475, 645, 273, 292, 158, 510, 506, 337, 8, 309, 267, 217, 79, 13, 657, 207, 722, 311],
9
+ "neg_dims": [642, 224, 113, 565, 49, 637, 45, 520, 219, 290, 529, 617, 269, 745, 576, 701, 105, 694, 82, 283],
10
+ "pos_weight": 1.0,
11
+ "neg_weight": -1.0,
12
+ "threshold": 25.284494400024414,
13
+ "decision": "sum(feat[pos_dims]) - sum(feat[neg_dims]) > threshold",
14
+ "free_parameters": 1,
15
+ "fixed_parameters": {
16
+ "dim_indices": 40,
17
+ "signs": 40
18
+ }
19
+ }
stage_0/cojoint_discovery.json ADDED
@@ -0,0 +1,500 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "n_sampled": 100000,
3
+ "K": 92,
4
+ "n_kept": 1000,
5
+ "f1_distribution": {
6
+ "min": 0.7833036780357361,
7
+ "p50": 0.8676828145980835,
8
+ "p95": 0.9258173704147339,
9
+ "p99": 0.9300636053085327,
10
+ "max": 0.9373591542243958
11
+ },
12
+ "top_kept_f1_range": [
13
+ 0.9300652146339417,
14
+ 0.9373591542243958
15
+ ],
16
+ "best_single_genome_f1": 0.9373591542243958,
17
+ "best_single_genome_sorted": [
18
+ 2,
19
+ 4,
20
+ 8,
21
+ 21,
22
+ 29,
23
+ 39,
24
+ 48,
25
+ 67,
26
+ 69,
27
+ 75,
28
+ 91,
29
+ 119,
30
+ 124,
31
+ 141,
32
+ 149,
33
+ 157,
34
+ 158,
35
+ 168,
36
+ 177,
37
+ 181,
38
+ 192,
39
+ 198,
40
+ 200,
41
+ 218,
42
+ 220,
43
+ 240,
44
+ 249,
45
+ 250,
46
+ 259,
47
+ 267,
48
+ 269,
49
+ 272,
50
+ 273,
51
+ 275,
52
+ 285,
53
+ 286,
54
+ 297,
55
+ 300,
56
+ 304,
57
+ 318,
58
+ 319,
59
+ 323,
60
+ 328,
61
+ 330,
62
+ 340,
63
+ 373,
64
+ 374,
65
+ 377,
66
+ 379,
67
+ 391,
68
+ 393,
69
+ 403,
70
+ 419,
71
+ 429,
72
+ 433,
73
+ 436,
74
+ 454,
75
+ 463,
76
+ 476,
77
+ 520,
78
+ 524,
79
+ 530,
80
+ 537,
81
+ 548,
82
+ 550,
83
+ 558,
84
+ 560,
85
+ 570,
86
+ 574,
87
+ 582,
88
+ 590,
89
+ 594,
90
+ 605,
91
+ 606,
92
+ 609,
93
+ 612,
94
+ 617,
95
+ 624,
96
+ 630,
97
+ 646,
98
+ 649,
99
+ 657,
100
+ 660,
101
+ 673,
102
+ 680,
103
+ 700,
104
+ 715,
105
+ 716,
106
+ 737,
107
+ 740,
108
+ 754,
109
+ 755
110
+ ],
111
+ "dim_frequency_top_25": {
112
+ "48": 1.0,
113
+ "525": 0.3140000104904175,
114
+ "475": 0.3110000193119049,
115
+ "323": 0.23400001227855682,
116
+ "240": 0.21700000762939453,
117
+ "637": 0.2120000123977661,
118
+ "318": 0.20600001513957977,
119
+ "472": 0.20200000703334808,
120
+ "719": 0.20100000500679016,
121
+ "251": 0.19700001180171967,
122
+ "506": 0.19500000774860382,
123
+ "13": 0.1940000057220459,
124
+ "281": 0.1900000125169754,
125
+ "113": 0.1860000044107437,
126
+ "331": 0.1860000044107437,
127
+ "224": 0.18200001120567322,
128
+ "382": 0.1810000091791153,
129
+ "767": 0.18000000715255737,
130
+ "605": 0.1770000010728836,
131
+ "642": 0.17500001192092896,
132
+ "157": 0.17400000989437103,
133
+ "155": 0.1680000126361847,
134
+ "273": 0.16700001060962677,
135
+ "207": 0.16500000655651093,
136
+ "510": 0.16500000655651093
137
+ },
138
+ "union_classifier": {
139
+ "n_dims": 768,
140
+ "F1": 0.9616926908493042,
141
+ "precision": 0.989005982875824,
142
+ "recall": 0.9358474016189575
143
+ },
144
+ "cores": {
145
+ "23": [
146
+ 48,
147
+ 525,
148
+ 475,
149
+ 323,
150
+ 240,
151
+ 637,
152
+ 318,
153
+ 472,
154
+ 719,
155
+ 251,
156
+ 506,
157
+ 13,
158
+ 281,
159
+ 113,
160
+ 331,
161
+ 224,
162
+ 382,
163
+ 767,
164
+ 605,
165
+ 642,
166
+ 157,
167
+ 155,
168
+ 273
169
+ ],
170
+ "46": [
171
+ 48,
172
+ 525,
173
+ 475,
174
+ 323,
175
+ 240,
176
+ 637,
177
+ 318,
178
+ 472,
179
+ 719,
180
+ 251,
181
+ 506,
182
+ 13,
183
+ 281,
184
+ 113,
185
+ 331,
186
+ 224,
187
+ 382,
188
+ 767,
189
+ 605,
190
+ 642,
191
+ 157,
192
+ 155,
193
+ 273,
194
+ 207,
195
+ 510,
196
+ 127,
197
+ 245,
198
+ 759,
199
+ 292,
200
+ 531,
201
+ 617,
202
+ 333,
203
+ 527,
204
+ 45,
205
+ 332,
206
+ 574,
207
+ 595,
208
+ 269,
209
+ 158,
210
+ 432,
211
+ 0,
212
+ 119,
213
+ 310,
214
+ 490,
215
+ 557,
216
+ 528
217
+ ],
218
+ "92": [
219
+ 48,
220
+ 525,
221
+ 475,
222
+ 323,
223
+ 240,
224
+ 637,
225
+ 318,
226
+ 472,
227
+ 719,
228
+ 251,
229
+ 506,
230
+ 13,
231
+ 281,
232
+ 113,
233
+ 331,
234
+ 224,
235
+ 382,
236
+ 767,
237
+ 605,
238
+ 642,
239
+ 157,
240
+ 155,
241
+ 273,
242
+ 207,
243
+ 510,
244
+ 127,
245
+ 245,
246
+ 759,
247
+ 292,
248
+ 531,
249
+ 617,
250
+ 333,
251
+ 527,
252
+ 45,
253
+ 332,
254
+ 574,
255
+ 595,
256
+ 269,
257
+ 158,
258
+ 432,
259
+ 0,
260
+ 119,
261
+ 310,
262
+ 490,
263
+ 557,
264
+ 528,
265
+ 545,
266
+ 558,
267
+ 597,
268
+ 645,
269
+ 28,
270
+ 51,
271
+ 79,
272
+ 82,
273
+ 301,
274
+ 126,
275
+ 542,
276
+ 173,
277
+ 90,
278
+ 380,
279
+ 71,
280
+ 384,
281
+ 514,
282
+ 620,
283
+ 630,
284
+ 713,
285
+ 223,
286
+ 516,
287
+ 520,
288
+ 606,
289
+ 17,
290
+ 172,
291
+ 355,
292
+ 393,
293
+ 421,
294
+ 745,
295
+ 68,
296
+ 252,
297
+ 280,
298
+ 334,
299
+ 657,
300
+ 95,
301
+ 111,
302
+ 151,
303
+ 244,
304
+ 543,
305
+ 748,
306
+ 290,
307
+ 363,
308
+ 469,
309
+ 561,
310
+ 591
311
+ ],
312
+ "184": [
313
+ 48,
314
+ 525,
315
+ 475,
316
+ 323,
317
+ 240,
318
+ 637,
319
+ 318,
320
+ 472,
321
+ 719,
322
+ 251,
323
+ 506,
324
+ 13,
325
+ 281,
326
+ 113,
327
+ 331,
328
+ 224,
329
+ 382,
330
+ 767,
331
+ 605,
332
+ 642,
333
+ 157,
334
+ 155,
335
+ 273,
336
+ 207,
337
+ 510,
338
+ 127,
339
+ 245,
340
+ 759,
341
+ 292,
342
+ 531,
343
+ 617,
344
+ 333,
345
+ 527,
346
+ 45,
347
+ 332,
348
+ 574,
349
+ 595,
350
+ 269,
351
+ 158,
352
+ 432,
353
+ 0,
354
+ 119,
355
+ 310,
356
+ 490,
357
+ 557,
358
+ 528,
359
+ 545,
360
+ 558,
361
+ 597,
362
+ 645,
363
+ 28,
364
+ 51,
365
+ 79,
366
+ 82,
367
+ 301,
368
+ 126,
369
+ 542,
370
+ 173,
371
+ 90,
372
+ 380,
373
+ 71,
374
+ 384,
375
+ 514,
376
+ 620,
377
+ 630,
378
+ 713,
379
+ 223,
380
+ 516,
381
+ 520,
382
+ 606,
383
+ 17,
384
+ 172,
385
+ 355,
386
+ 393,
387
+ 421,
388
+ 745,
389
+ 68,
390
+ 252,
391
+ 280,
392
+ 334,
393
+ 657,
394
+ 95,
395
+ 111,
396
+ 151,
397
+ 244,
398
+ 543,
399
+ 748,
400
+ 290,
401
+ 363,
402
+ 469,
403
+ 561,
404
+ 591,
405
+ 96,
406
+ 397,
407
+ 419,
408
+ 425,
409
+ 523,
410
+ 627,
411
+ 734,
412
+ 124,
413
+ 236,
414
+ 454,
415
+ 9,
416
+ 345,
417
+ 358,
418
+ 480,
419
+ 575,
420
+ 736,
421
+ 26,
422
+ 34,
423
+ 65,
424
+ 141,
425
+ 152,
426
+ 176,
427
+ 204,
428
+ 293,
429
+ 305,
430
+ 445,
431
+ 282,
432
+ 389,
433
+ 437,
434
+ 619,
435
+ 162,
436
+ 178,
437
+ 311,
438
+ 394,
439
+ 409,
440
+ 483,
441
+ 496,
442
+ 613,
443
+ 717,
444
+ 50,
445
+ 188,
446
+ 216,
447
+ 325,
448
+ 602,
449
+ 699,
450
+ 740,
451
+ 254,
452
+ 258,
453
+ 299,
454
+ 403,
455
+ 495,
456
+ 603,
457
+ 622,
458
+ 40,
459
+ 83,
460
+ 146,
461
+ 150,
462
+ 232,
463
+ 259,
464
+ 341,
465
+ 392,
466
+ 444,
467
+ 522,
468
+ 567,
469
+ 629,
470
+ 647,
471
+ 25,
472
+ 170,
473
+ 205,
474
+ 267,
475
+ 274,
476
+ 307,
477
+ 411,
478
+ 549,
479
+ 576,
480
+ 618,
481
+ 686,
482
+ 760,
483
+ 8,
484
+ 200,
485
+ 255,
486
+ 353,
487
+ 354,
488
+ 413,
489
+ 455,
490
+ 463,
491
+ 544,
492
+ 721,
493
+ 93,
494
+ 171,
495
+ 225,
496
+ 234
497
+ ]
498
+ },
499
+ "runtime_s": 4.635331869125366
500
+ }
stage_0/compressed_variants.json ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": [
3
+ {
4
+ "name": "ref: full 768 ridge",
5
+ "params": 769,
6
+ "F1": 0.9598035216331482,
7
+ "precision": 0.9898664355278015,
8
+ "recall": 0.9315127730369568,
9
+ "threshold": null
10
+ },
11
+ {
12
+ "name": "ref: K=92 ridge (cojoint top-92 + bias)",
13
+ "params": 93,
14
+ "F1": 0.9463722109794617,
15
+ "precision": 0.9854528307914734,
16
+ "recall": 0.9102730751037598,
17
+ "threshold": null
18
+ },
19
+ {
20
+ "name": "E: ternary weights top-50 pos vs top-50 neg, threshold",
21
+ "params": 1,
22
+ "F1": 0.8933987617492676,
23
+ "precision": 0.9200735092163086,
24
+ "recall": 0.8682271242141724,
25
+ "threshold": 31.819643020629883
26
+ },
27
+ {
28
+ "name": "C: threshold(sum top-20 pos \u2212 sum top-20 neg)",
29
+ "params": 1,
30
+ "F1": 0.8808632493019104,
31
+ "precision": 0.8952551484107971,
32
+ "recall": 0.8669267296791077,
33
+ "threshold": 24.8664608001709
34
+ },
35
+ {
36
+ "name": "E: ternary weights top-20 pos vs top-20 neg, threshold",
37
+ "params": 1,
38
+ "F1": 0.8808632493019104,
39
+ "precision": 0.8952551484107971,
40
+ "recall": 0.8669267296791077,
41
+ "threshold": 24.86646270751953
42
+ },
43
+ {
44
+ "name": "C: threshold(sum top-10 pos \u2212 sum top-10 neg)",
45
+ "params": 1,
46
+ "F1": 0.8801606893539429,
47
+ "precision": 0.9070836901664734,
48
+ "recall": 0.8547897934913635,
49
+ "threshold": 22.383634567260742
50
+ },
51
+ {
52
+ "name": "E: ternary weights top-10 pos vs top-10 neg, threshold",
53
+ "params": 1,
54
+ "F1": 0.8801606893539429,
55
+ "precision": 0.9070836901664734,
56
+ "recall": 0.8547897934913635,
57
+ "threshold": 22.383630752563477
58
+ },
59
+ {
60
+ "name": "B: threshold(sum top-2 pos dims)",
61
+ "params": 1,
62
+ "F1": 0.8780821561813354,
63
+ "precision": 0.9276410937309265,
64
+ "recall": 0.8335500359535217,
65
+ "threshold": 14.303534507751465
66
+ },
67
+ {
68
+ "name": "B: threshold(sum top-20 pos dims)",
69
+ "params": 1,
70
+ "F1": 0.8683924674987793,
71
+ "precision": 0.8716157078742981,
72
+ "recall": 0.8651928901672363,
73
+ "threshold": 62.11921310424805
74
+ },
75
+ {
76
+ "name": "C: threshold(sum top-5 pos \u2212 sum top-5 neg)",
77
+ "params": 1,
78
+ "F1": 0.8637353181838989,
79
+ "precision": 0.867512047290802,
80
+ "recall": 0.8599913120269775,
81
+ "threshold": 27.896242141723633
82
+ },
83
+ {
84
+ "name": "B: threshold(sum top-10 pos dims)",
85
+ "params": 1,
86
+ "F1": 0.8574432134628296,
87
+ "precision": 0.8325847387313843,
88
+ "recall": 0.883831799030304,
89
+ "threshold": 45.166908264160156
90
+ },
91
+ {
92
+ "name": "A: threshold(dim48)",
93
+ "params": 1,
94
+ "F1": 0.8451337814331055,
95
+ "precision": 0.8776844143867493,
96
+ "recall": 0.8149111270904541,
97
+ "threshold": 6.3439040184021
98
+ },
99
+ {
100
+ "name": "ref: K=1 ridge (dim48 + bias)",
101
+ "params": 2,
102
+ "F1": 0.8285356163978577,
103
+ "precision": 0.7985524535179138,
104
+ "recall": 0.8608582615852356,
105
+ "threshold": null
106
+ },
107
+ {
108
+ "name": "B: threshold(sum top-5 pos dims)",
109
+ "params": 1,
110
+ "F1": 0.8214052319526672,
111
+ "precision": 0.824454128742218,
112
+ "recall": 0.8183788657188416,
113
+ "threshold": 37.6262092590332
114
+ },
115
+ {
116
+ "name": "C: threshold(sum top-3 pos \u2212 sum top-3 neg)",
117
+ "params": 1,
118
+ "F1": 0.8199912905693054,
119
+ "precision": 0.8185744881629944,
120
+ "recall": 0.8214130997657776,
121
+ "threshold": 18.052114486694336
122
+ },
123
+ {
124
+ "name": "B: threshold(sum top-3 pos dims)",
125
+ "params": 1,
126
+ "F1": 0.7914334535598755,
127
+ "precision": 0.7131432294845581,
128
+ "recall": 0.8890333771705627,
129
+ "threshold": 23.4901123046875
130
+ },
131
+ {
132
+ "name": "D: threshold(max top-3 pos \u2212 max top-3 neg)",
133
+ "params": 1,
134
+ "F1": 0.7337717413902283,
135
+ "precision": 0.6004415154457092,
136
+ "recall": 0.9432163238525391,
137
+ "threshold": 4.771557331085205
138
+ },
139
+ {
140
+ "name": "D: threshold(max top-5 pos \u2212 max top-5 neg)",
141
+ "params": 1,
142
+ "F1": 0.7142618894577026,
143
+ "precision": 0.5805314779281616,
144
+ "recall": 0.9280450940132141,
145
+ "threshold": 6.046311378479004
146
+ },
147
+ {
148
+ "name": "D: threshold(max top-10 pos \u2212 max top-10 neg)",
149
+ "params": 1,
150
+ "F1": 0.7102322578430176,
151
+ "precision": 0.5886545181274414,
152
+ "recall": 0.8951018452644348,
153
+ "threshold": 4.326292991638184
154
+ },
155
+ {
156
+ "name": "D: threshold(max top-20 pos \u2212 max top-20 neg)",
157
+ "params": 1,
158
+ "F1": 0.7102322578430176,
159
+ "precision": 0.5886545181274414,
160
+ "recall": 0.8951018452644348,
161
+ "threshold": 4.326292991638184
162
+ }
163
+ ],
164
+ "top_pos_dims_30": [
165
+ 48,
166
+ 525,
167
+ 475,
168
+ 645,
169
+ 273,
170
+ 292,
171
+ 158,
172
+ 510,
173
+ 506,
174
+ 337,
175
+ 8,
176
+ 309,
177
+ 267,
178
+ 217,
179
+ 79,
180
+ 13,
181
+ 657,
182
+ 207,
183
+ 722,
184
+ 311,
185
+ 566,
186
+ 278,
187
+ 25,
188
+ 627,
189
+ 511,
190
+ 332,
191
+ 654,
192
+ 719,
193
+ 593,
194
+ 305
195
+ ],
196
+ "top_neg_dims_30": [
197
+ 642,
198
+ 224,
199
+ 113,
200
+ 565,
201
+ 49,
202
+ 637,
203
+ 45,
204
+ 520,
205
+ 219,
206
+ 290,
207
+ 529,
208
+ 617,
209
+ 269,
210
+ 745,
211
+ 576,
212
+ 701,
213
+ 105,
214
+ 694,
215
+ 82,
216
+ 283,
217
+ 574,
218
+ 310,
219
+ 613,
220
+ 90,
221
+ 92,
222
+ 650,
223
+ 36,
224
+ 53,
225
+ 396,
226
+ 17
227
+ ]
228
+ }
stage_0/eval.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset": "COCO val2017",
3
+ "n_images": 5000,
4
+ "task": "image-level person presence (binary)",
5
+ "positive_rate": 0.539,
6
+ "protocol": "live Argus forward pass at 768 pixel input, no feature caching",
7
+ "metrics": {
8
+ "F1": 0.8886,
9
+ "precision": 0.9011,
10
+ "recall": 0.8763,
11
+ "optimal_threshold": 25.2845
12
+ },
13
+ "parity_cached_features": 0.881
14
+ }
stage_0/infer.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Reference inference for the Stage 0 baseline.
2
+
3
+ Loads Argus (EUPE-ViT-B backbone), reads the classifier config, and scores one
4
+ or more images. Prints the raw score and the binary decision.
5
+
6
+ Usage: python infer.py image1.jpg [image2.jpg ...]
7
+ """
8
+ import json, sys, os
9
+ import torch
10
+ import torch.nn.functional as F
11
+ from PIL import Image
12
+ import numpy as np
13
+ from transformers import AutoModel
14
+
15
+
16
+ def load_classifier(path='classifier.json'):
17
+ with open(path) as f:
18
+ return json.load(f)
19
+
20
+
21
+ def load_argus(repo_or_path='phanerozoic/argus'):
22
+ return AutoModel.from_pretrained(repo_or_path, trust_remote_code=True)
23
+
24
+
25
+ def preprocess(image_path, resolution=768, device='cuda'):
26
+ img = Image.open(image_path).convert('RGB').resize((resolution, resolution), Image.BILINEAR)
27
+ arr = np.asarray(img, dtype=np.uint8).copy()
28
+ x = torch.from_numpy(arr).permute(2, 0, 1).unsqueeze(0).to(device).float() / 255.0
29
+ mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1).to(device)
30
+ std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1).to(device)
31
+ return (x - mean) / std
32
+
33
+
34
+ @torch.inference_mode()
35
+ def score(model, x, classifier):
36
+ with torch.autocast('cuda', dtype=torch.bfloat16):
37
+ out = model.backbone.forward_features(x)
38
+ patches = out['x_norm_patchtokens'].float().squeeze(0)
39
+ D = classifier['feature_dim']
40
+ ln = F.layer_norm(patches, [D])
41
+ pooled = ln.max(dim=0).values
42
+ pos = pooled[classifier['pos_dims']].sum()
43
+ neg = pooled[classifier['neg_dims']].sum()
44
+ return float((pos - neg).item())
45
+
46
+
47
+ def main():
48
+ if len(sys.argv) < 2:
49
+ print('usage: python infer.py <image1> [image2 ...]')
50
+ sys.exit(1)
51
+ here = os.path.dirname(os.path.abspath(__file__))
52
+ classifier = load_classifier(os.path.join(here, 'classifier.json'))
53
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
54
+ model = load_argus().to(device).eval()
55
+ thr = classifier['threshold']
56
+ for image_path in sys.argv[1:]:
57
+ x = preprocess(image_path, classifier['input_resolution'], device)
58
+ s = score(model, x, classifier)
59
+ print(f'{image_path} score={s:+.3f} threshold={thr:+.3f} person={s > thr}')
60
+
61
+
62
+ if __name__ == '__main__':
63
+ main()
stage_1/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Stage 1: Output-Channel Pruning
2
+
3
+ Reserved. See repo root README for plan.
4
+
5
+ Scope: keep only the 100 feature dimensions the Stage 0 classifier reads, remove the remaining 668 output channels from EUPE-ViT-B's final projection. No retraining. Expected to preserve F1 exactly.
stage_2/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Stage 2: Attention-Head Pruning
2
+
3
+ Reserved. See repo root README for plan.
4
+
5
+ Scope: measure each of EUPE-ViT-B's 144 attention heads (12 blocks x 12 heads) for contribution to the 100 dims Stage 0 reads. Ablate low-contribution heads.
stage_3/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Stage 3: Depth Reduction
2
+
3
+ Reserved. See repo root README for plan.
4
+
5
+ Scope: drop transformer blocks that do not route signal to the 100 Stage 0 dims.
stage_4/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Stage 4: Specialist Backbone
2
+
3
+ Reserved. See repo root README for plan.
4
+
5
+ Scope: train a small student network that emits only the 100 target dims of EUPE-ViT-B, supervised against those dims on a large image corpus.
stage_5/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Stage 5: Circuit-Level Synthesis
2
+
3
+ Reserved. See repo root README for plan.
4
+
5
+ Scope: synthesize the entire fixed-weight pipeline to gates and dead-code eliminate everything that does not reach the classifier output.