Update rapid_prototype_output_bigger_bank.txt
Browse files
rapid_prototype_output_bigger_bank.txt
CHANGED
|
@@ -10,7 +10,7 @@ PHASE 0: EXTRACTION
|
|
| 10 |
|
| 11 |
Extracting: bert...
|
| 12 |
Loading weights: 100%
|
| 13 |
-
199/199 [00:00<00:00,
|
| 14 |
BertModel LOAD REPORT from: google-bert/bert-base-uncased
|
| 15 |
Key | Status | |
|
| 16 |
-------------------------------------------+------------+--+-
|
|
@@ -29,7 +29,7 @@ Notes:
|
|
| 29 |
|
| 30 |
Extracting: modern...
|
| 31 |
Loading weights: 100%
|
| 32 |
-
134/134 [00:00<00:00,
|
| 33 |
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
|
| 34 |
Key | Status | |
|
| 35 |
------------------+------------+--+-
|
|
@@ -71,42 +71,42 @@ PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
|
|
| 71 |
Student embeddings: torch.Size([18000, 768])
|
| 72 |
Expert 0 (bert): rotation loaded, cos_after=1.0000
|
| 73 |
Expert 1 (modern): rotation loaded, cos_after=0.4849
|
| 74 |
-
Anchors:
|
| 75 |
-
Bank: 1,
|
| 76 |
-
E 1: 1s loss=0.
|
| 77 |
-
E 2: 1s loss=0.
|
| 78 |
-
E 3: 1s loss=0.
|
| 79 |
-
E 4: 1s loss=0.
|
| 80 |
-
E 5: 1s loss=0.
|
| 81 |
-
E 6: 1s loss=0.
|
| 82 |
-
E 7: 1s loss=0.
|
| 83 |
-
E 8: 1s loss=0.
|
| 84 |
-
E 9: 1s loss=0.
|
| 85 |
-
E10: 1s loss=0.
|
| 86 |
-
E11: 1s loss=0.
|
| 87 |
-
E12: 1s loss=0.
|
| 88 |
-
E13: 1s loss=0.
|
| 89 |
-
E14: 1s loss=0.
|
| 90 |
-
E15: 1s loss=0.
|
| 91 |
-
E16: 1s loss=0.
|
| 92 |
-
E17: 1s loss=0.
|
| 93 |
-
E18: 1s loss=0.
|
| 94 |
-
E19: 1s loss=0.
|
| 95 |
-
E20: 1s loss=0.
|
| 96 |
|
| 97 |
=================================================================
|
| 98 |
PHASE 3: GEOMETRIC VERIFICATION
|
| 99 |
=================================================================
|
| 100 |
Passthrough integrity: 1.000000 (should be ~1.000)
|
| 101 |
-
Geo context CV: 0.
|
| 102 |
-
Geo context eff_dim: 21.
|
| 103 |
Geo context shape: torch.Size([2000, 64])
|
| 104 |
|
| 105 |
=================================================================
|
| 106 |
PHASE 4: CLASSIFIER STABILITY TEST
|
| 107 |
=================================================================
|
| 108 |
-
with_bank : train_acc=0.
|
| 109 |
-
without_bank : train_acc=0.
|
| 110 |
|
| 111 |
=================================================================
|
| 112 |
DONE
|
|
|
|
| 10 |
|
| 11 |
Extracting: bert...
|
| 12 |
Loading weights: 100%
|
| 13 |
+
199/199 [00:00<00:00, 4263.57it/s, Materializing param=pooler.dense.weight]
|
| 14 |
BertModel LOAD REPORT from: google-bert/bert-base-uncased
|
| 15 |
Key | Status | |
|
| 16 |
-------------------------------------------+------------+--+-
|
|
|
|
| 29 |
|
| 30 |
Extracting: modern...
|
| 31 |
Loading weights: 100%
|
| 32 |
+
134/134 [00:00<00:00, 4171.61it/s, Materializing param=layers.21.mlp_norm.weight]
|
| 33 |
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
|
| 34 |
Key | Status | |
|
| 35 |
------------------+------------+--+-
|
|
|
|
| 71 |
Student embeddings: torch.Size([18000, 768])
|
| 72 |
Expert 0 (bert): rotation loaded, cos_after=1.0000
|
| 73 |
Expert 1 (modern): rotation loaded, cos_after=0.4849
|
| 74 |
+
Anchors: 512 initialized from consensus embeddings
|
| 75 |
+
Bank: 1,649,216 params
|
| 76 |
+
E 1: 1s loss=0.4828 v_loss=0.4384 expert_agr=0.00011 ortho=0.00007 spread=0.02783 cv=0.2529 anchor_max=0.625 expert_cos=0.827±0.045
|
| 77 |
+
E 2: 1s loss=0.3974 v_loss=0.3863 expert_agr=0.00007 ortho=0.00009 spread=0.02224 cv=0.2140 anchor_max=0.653 expert_cos=0.832±0.052
|
| 78 |
+
E 3: 1s loss=0.3647 v_loss=0.3696 expert_agr=0.00006 ortho=0.00008 spread=0.02128 cv=0.2076 anchor_max=0.656 expert_cos=0.788±0.056
|
| 79 |
+
E 4: 1s loss=0.3518 v_loss=0.3461 expert_agr=0.00005 ortho=0.00010 spread=0.02102 cv=0.2051 anchor_max=0.656 expert_cos=0.800±0.053
|
| 80 |
+
E 5: 1s loss=0.3428 v_loss=0.3584 expert_agr=0.00005 ortho=0.00011 spread=0.02022 cv=0.1954 anchor_max=0.657 expert_cos=0.787±0.055
|
| 81 |
+
E 6: 1s loss=0.3386 v_loss=0.3424 expert_agr=0.00005 ortho=0.00014 spread=0.01978 cv=0.1941 anchor_max=0.656 expert_cos=0.755±0.061
|
| 82 |
+
E 7: 1s loss=0.3365 v_loss=0.3332 expert_agr=0.00004 ortho=0.00013 spread=0.01886 cv=0.1947 anchor_max=0.656 expert_cos=0.763±0.060
|
| 83 |
+
E 8: 1s loss=0.3337 v_loss=0.3413 expert_agr=0.00004 ortho=0.00013 spread=0.01852 cv=0.1900 anchor_max=0.656 expert_cos=0.792±0.053
|
| 84 |
+
E 9: 1s loss=0.3332 v_loss=0.3489 expert_agr=0.00004 ortho=0.00014 spread=0.01789 cv=0.1942 anchor_max=0.655 expert_cos=0.735±0.060
|
| 85 |
+
E10: 1s loss=0.3314 v_loss=0.3395 expert_agr=0.00004 ortho=0.00013 spread=0.01814 cv=0.1903 anchor_max=0.655 expert_cos=0.721±0.065
|
| 86 |
+
E11: 1s loss=0.3286 v_loss=0.3470 expert_agr=0.00004 ortho=0.00013 spread=0.01801 cv=0.1832 anchor_max=0.655 expert_cos=0.775±0.060
|
| 87 |
+
E12: 1s loss=0.3285 v_loss=0.3399 expert_agr=0.00004 ortho=0.00016 spread=0.01787 cv=0.1861 anchor_max=0.654 expert_cos=0.761±0.058
|
| 88 |
+
E13: 1s loss=0.3275 v_loss=0.3392 expert_agr=0.00004 ortho=0.00015 spread=0.01772 cv=0.1839 anchor_max=0.655 expert_cos=0.729±0.065
|
| 89 |
+
E14: 1s loss=0.3267 v_loss=0.3351 expert_agr=0.00004 ortho=0.00013 spread=0.01753 cv=0.1809 anchor_max=0.654 expert_cos=0.735±0.065
|
| 90 |
+
E15: 1s loss=0.3260 v_loss=0.3497 expert_agr=0.00004 ortho=0.00014 spread=0.01737 cv=0.1809 anchor_max=0.654 expert_cos=0.772±0.061
|
| 91 |
+
E16: 1s loss=0.3266 v_loss=0.3403 expert_agr=0.00003 ortho=0.00015 spread=0.01730 cv=0.1845 anchor_max=0.654 expert_cos=0.715±0.070
|
| 92 |
+
E17: 1s loss=0.3245 v_loss=0.3252 expert_agr=0.00003 ortho=0.00014 spread=0.01732 cv=0.1788 anchor_max=0.654 expert_cos=0.744±0.067
|
| 93 |
+
E18: 1s loss=0.3237 v_loss=0.3254 expert_agr=0.00003 ortho=0.00014 spread=0.01707 cv=0.1782 anchor_max=0.654 expert_cos=0.721±0.064
|
| 94 |
+
E19: 1s loss=0.3231 v_loss=0.3327 expert_agr=0.00003 ortho=0.00015 spread=0.01721 cv=0.1784 anchor_max=0.653 expert_cos=0.690±0.072
|
| 95 |
+
E20: 1s loss=0.3237 v_loss=0.3267 expert_agr=0.00003 ortho=0.00013 spread=0.01693 cv=0.1800 anchor_max=0.653 expert_cos=0.723±0.071
|
| 96 |
|
| 97 |
=================================================================
|
| 98 |
PHASE 3: GEOMETRIC VERIFICATION
|
| 99 |
=================================================================
|
| 100 |
Passthrough integrity: 1.000000 (should be ~1.000)
|
| 101 |
+
Geo context CV: 0.1651
|
| 102 |
+
Geo context eff_dim: 21.5
|
| 103 |
Geo context shape: torch.Size([2000, 64])
|
| 104 |
|
| 105 |
=================================================================
|
| 106 |
PHASE 4: CLASSIFIER STABILITY TEST
|
| 107 |
=================================================================
|
| 108 |
+
with_bank : train_acc=0.481 val_acc=0.390 gap=0.091
|
| 109 |
+
without_bank : train_acc=0.443 val_acc=0.330 gap=0.113
|
| 110 |
|
| 111 |
=================================================================
|
| 112 |
DONE
|