AbstractPhil
/

procrustes-analysis

Model card Files Files and versions

xet

Community

AbstractPhil commited on 18 days ago

Commit

eaad7fb

verified ·

1 Parent(s): d80e36b

Update analysis_bert_large_clip-vit-b+bigG+dino2-l-16.txt

Browse files

Files changed (1) hide show

analysis_bert_large_clip-vit-b+bigG+dino2-l-16.txt +309 -0

analysis_bert_large_clip-vit-b+bigG+dino2-l-16.txt CHANGED Viewed

	@@ -0,0 +1,309 @@

+Loading BERT-large...
+config.json: 100%
+ 571/571 [00:00<00:00, 70.6kB/s]
+model.safetensors:  65%
+ 871M/1.34G [00:04<00:13, 34.8MB/s]
+Loading weights: 100%
+ 391/391 [00:00<00:00, 1112.55it/s, Materializing param=pooler.dense.weight]
+BertModel LOAD REPORT from: google-bert/bert-large-uncased
+Key                                        | Status     |  |
+-------------------------------------------+------------+--+-
+cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  |
+cls.predictions.transform.dense.weight     | UNEXPECTED |  |
+cls.predictions.transform.dense.bias       | UNEXPECTED |  |
+cls.seq_relationship.bias                  | UNEXPECTED |  |
+cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  |
+cls.predictions.bias                       | UNEXPECTED |  |
+cls.seq_relationship.weight                | UNEXPECTED |  |
+Notes:
+- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
+======================================================================
+MODEL: BERT-large (1024d, 24L, 16H)
+======================================================================
+--- WEIGHT CATALOG ---
+  embedding                :   3 matrices,   31,780,864 params, shapes={'(2, 1024)', '(512, 1024)', '(30522, 1024)'}
+  mlp_down                 :  24 matrices,  100,663,296 params, shapes={'(1024, 4096)'}
+  mlp_up                   :  24 matrices,  100,663,296 params, shapes={'(4096, 1024)'}
+  pooler                   :   1 matrices,    1,048,576 params, shapes={'(1024, 1024)'}
+  self_attn_k              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  self_attn_o              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  self_attn_q              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  self_attn_v              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  TOTAL                    :  334,819,328 params (2D only)
+--- SVD EFFECTIVE RANK ---
+Type                       StableRank        PR   Active%   Rank90   Condition
+  mlp_down                      52.17    882.95     1.000    838.0        23.0
+  mlp_up                        27.37    856.14     1.000    832.8        33.8
+  self_attn_k                   37.72    597.14     0.949    642.1     16649.8
+  self_attn_o                  125.04    662.72     0.976    660.3     20582.9
+  self_attn_q                   50.84    606.30     0.956    643.2     60065.9
+  self_attn_v                  113.04    653.41     0.974    658.8     59710.1
+--- SPARSITY TOPOLOGY ---
+Type                        <0.0001    <0.001     <0.01      <0.1
+  embedding                  0.0018    0.0184    0.1815    0.9699
+  mlp_down                   0.0025    0.0251    0.2466    0.9954
+  mlp_up                     0.0023    0.0231    0.2283    0.9944
+  pooler                     0.0028    0.0280    0.2741    0.9981
+  self_attn_k                0.0022    0.0221    0.2178    0.9913
+  self_attn_o                0.0031    0.0308    0.2997    0.9990
+  self_attn_q                0.0022    0.0218    0.2149    0.9907
+  self_attn_v                0.0029    0.0294    0.2852    0.9989
+  FULL MODEL                 0.0024    0.0242    0.2373    0.9925
+--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
+  self_attn_q              : 99.1%
+  self_attn_k              : 99.1%
+  self_attn_v              : 99.9%
+--- QK SIMILARITY MANIFOLD ---
+ Layer  StableRk        PR    Pos    Neg    SymDev      TopEig
+     0      6.42    266.22    457    567    1.0866       20.61
+     1      3.60    194.32    454    570    1.0641       29.77
+     2      6.14    215.22    474    550    1.1773       23.79
+     3      6.11    162.90    468    556    1.1421       34.35
+     4      5.74    237.65    455    569    1.1145       30.60
+     5      6.30    255.58    460    564    1.1704       26.16
+  ... (24 layers total)
+    23      4.07    206.01    525    499    0.8791       43.56
+  Positive eig fraction: layer 0 = 0.446, last = 0.513
+--- MLP DEAD NEURONS ---
+  Dead (<1% mean): 0/98304 (0.00%)
+  Weak (<10% mean): 0/98304 (0.00%)
+--- CROSS-LAYER CORRELATION (adjacent pairs) ---
+  self_attn_q              : adj_mean=0.0002, adj_range=[-0.0035, 0.0036]
+  self_attn_k              : adj_mean=0.0003, adj_range=[-0.0036, 0.0033]
+  mlp_up                   : adj_mean=0.0315, adj_range=[0.0239, 0.0494]
+Loading CLIP-ViT-B/16 (LAION)...
+open_clip_model.safetensors: 100%
+ 599M/599M [00:03<00:00, 218MB/s]
+======================================================================
+MODEL: CLIP-ViT-B/16 LAION (768d, 12L, 12H)
+======================================================================
+--- WEIGHT CATALOG ---
+  embedding                :   1 matrices,      151,296 params, shapes={'(197, 768)'}
+  mlp_down                 :  12 matrices,   28,311,552 params, shapes={'(768, 3072)'}
+  mlp_up                   :  12 matrices,   28,311,552 params, shapes={'(3072, 768)'}
+  projection               :   1 matrices,      393,216 params, shapes={'(768, 512)'}
+  self_attn_o              :  12 matrices,    7,077,888 params, shapes={'(768, 768)'}
+  self_attn_qkv            :  12 matrices,   21,233,664 params, shapes={'(2304, 768)'}
+  TOTAL                    :   85,479,168 params (2D only)
+--- SVD EFFECTIVE RANK ---
+Type                       StableRank        PR   Active%   Rank90   Condition
+  mlp_down                     125.16    644.07     1.000    601.9        43.5
+  mlp_up                        59.69    631.06     0.993    603.4       372.0
+  self_attn_o                   77.37    515.53     0.967    491.8     37372.2
+  self_attn_qkv                 94.39    552.43     0.929    546.8     18558.2
+--- SPARSITY TOPOLOGY ---
+Type                        <0.0001    <0.001     <0.01      <0.1
+  embedding                  0.0202    0.2072    0.8578    0.9983
+  mlp_down                   0.0145    0.0992    0.5794    0.9999
+  mlp_up                     0.0101    0.0797    0.5233    0.9999
+  projection                 0.0058    0.0573    0.5237    1.0000
+  self_attn_o                0.0066    0.0655    0.5525    0.9999
+  self_attn_qkv              0.0535    0.1189    0.5087    0.9999
+  FULL MODEL                 0.0221    0.0949    0.5413    0.9999
+--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
+  self_attn_qkv            : 100.0%
+--- QK SIMILARITY MANIFOLD ---
+ Layer  StableRk        PR    Pos    Neg    SymDev      TopEig
+     0      2.06     59.23    386    382    1.0944        5.51
+     1      3.51     82.73    447    321    0.8367        8.79
+     2      8.48    108.22    401    367    0.9786        4.70
+     3     22.88    193.84    406    362    1.0676        2.31
+     4     20.20    196.57    401    367    1.1014        2.38
+     5     26.05    249.44    384    384    1.1135        1.80
+  ... (12 layers total)
+    11     49.71    360.27    413    355    1.3842        0.53
+  Positive eig fraction: layer 0 = 0.503, last = 0.538
+--- MLP DEAD NEURONS ---
+  Dead (<1% mean): 1316/36864 (3.57%)
+  Weak (<10% mean): 1356/36864 (3.68%)
+--- CROSS-LAYER CORRELATION (adjacent pairs) ---
+  self_attn_qkv            : adj_mean=-0.0004, adj_range=[-0.0024, 0.0013]
+  mlp_up                   : adj_mean=0.0075, adj_range=[0.0000, 0.0304]
+Loading DINOv2-large...
+config.json: 100%
+ 549/549 [00:00<00:00, 69.5kB/s]
+model.safetensors:  94%
+ 1.15G/1.22G [00:06<00:03, 20.6MB/s]
+Loading weights: 100%
+ 439/439 [00:00<00:00, 1139.02it/s, Materializing param=layernorm.weight]
+======================================================================
+MODEL: DINOv2-large (1024d, 24L, 16H)
+======================================================================
+--- WEIGHT CATALOG ---
+  embedding                :   1 matrices,        1,024 params, shapes={'(1, 1024)'}
+  mlp_down                 :  24 matrices,  100,663,296 params, shapes={'(1024, 4096)'}
+  mlp_up                   :  24 matrices,  100,663,296 params, shapes={'(4096, 1024)'}
+  self_attn_k              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  self_attn_o              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  self_attn_q              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  self_attn_v              :  24 matrices,   25,165,824 params, shapes={'(1024, 1024)'}
+  TOTAL                    :  301,990,912 params (2D only)
+--- SVD EFFECTIVE RANK ---
+Type                       StableRank        PR   Active%   Rank90   Condition
+  mlp_down                      94.40    810.58     1.000    805.1        39.8
+  mlp_up                        58.43    764.26     0.979    769.8        50.2
+  self_attn_k                   55.47    485.95     0.827    533.2   1024763.2
+  self_attn_o                   85.58    642.50     0.955    636.4     83125.7
+  self_attn_q                   57.74    477.74     0.826    536.0    630324.9
+  self_attn_v                   94.84    590.99     0.932    610.2    490421.1
+--- SPARSITY TOPOLOGY ---
+Type                        <0.0001    <0.001     <0.01      <0.1
+  embedding                  1.0000    1.0000    1.0000    1.0000
+  mlp_down                   0.0072    0.0714    0.6036    0.9999
+  mlp_up                     0.0078    0.0687    0.5577    0.9999
+  self_attn_k                0.0081    0.0774    0.5406    0.9998
+  self_attn_o                0.0069    0.0687    0.5753    1.0000
+  self_attn_q                0.0088    0.0793    0.5452    0.9997
+  self_attn_v                0.0088    0.0861    0.5810    1.0000
+  FULL MODEL                 0.0077    0.0727    0.5740    0.9999
+--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
+  self_attn_q              : 100.0%
+  self_attn_k              : 100.0%
+  self_attn_v              : 100.0%
+--- QK SIMILARITY MANIFOLD ---
+ Layer  StableRk        PR    Pos    Neg    SymDev      TopEig
+     0      1.23      5.71    510    514    1.3859       12.89
+     1      5.40     35.56    515    509    1.0933        3.52
+     2      4.28     74.13    531    493    1.0389        4.57
+     3      4.49     80.31    559    465    1.0370        6.89
+     4      7.19    121.15    524    500    1.0951        4.28
+     5      7.72    117.31    551    473    0.9584        5.87
+  ... (24 layers total)
+    23      6.71    341.20    561    463    1.1911        2.44
+  Positive eig fraction: layer 0 = 0.498, last = 0.548
+--- MLP DEAD NEURONS ---
+  Dead (<1% mean): 0/98304 (0.00%)
+  Weak (<10% mean): 0/98304 (0.00%)
+--- CROSS-LAYER CORRELATION (adjacent pairs) ---
+  self_attn_q              : adj_mean=-0.0003, adj_range=[-0.0027, 0.0035]
+  self_attn_k              : adj_mean=-0.0002, adj_range=[-0.0026, 0.0030]
+  mlp_up                   : adj_mean=0.0058, adj_range=[0.0006, 0.0217]
+Loading CLIP-ViT-bigG/14 (LAION)...
+open_clip_model.safetensors: 100%
+ 10.2G/10.2G [00:29<00:00, 377MB/s]
+======================================================================
+MODEL: CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H)
+======================================================================
+--- WEIGHT CATALOG ---
+  embedding                :   1 matrices,      427,648 params, shapes={'(257, 1664)'}
+  mlp_down                 :  48 matrices,  654,311,424 params, shapes={'(1664, 8192)'}
+  mlp_up                   :  48 matrices,  654,311,424 params, shapes={'(8192, 1664)'}
+  projection               :   1 matrices,    2,129,920 params, shapes={'(1664, 1280)'}
+  self_attn_o              :  48 matrices,  132,907,008 params, shapes={'(1664, 1664)'}
+  self_attn_qkv            :  48 matrices,  398,721,024 params, shapes={'(4992, 1664)'}
+  TOTAL                    : 1,842,808,448 params (2D only)
+--- SVD EFFECTIVE RANK ---
+Type                       StableRank        PR   Active%   Rank90   Condition
+  mlp_down                      58.27    757.89     0.644    855.5  5983209984.0
+  mlp_up                        23.11    992.74     0.804   1045.1   6682717.5
+  self_attn_o                   48.31    547.82     0.531    593.5  5320487424.0
+  self_attn_qkv                102.36    834.12     0.757    890.4   1150494.6
+--- SPARSITY TOPOLOGY ---
+Type                        <0.0001    <0.001     <0.01      <0.1
+  embedding                  0.0255    0.2521    0.9654    0.9991
+  mlp_down                   0.3578    0.4691    0.6310    0.9473
+  mlp_up                     0.1763    0.3691    0.7113    1.0000
+  projection                 0.0047    0.0469    0.4397    1.0000
+  self_attn_o                0.3510    0.4770    0.6900    0.9838
+  self_attn_qkv              0.1685    0.2917    0.7124    0.9999
+  FULL MODEL                 0.2514    0.3952    0.6812    0.9801
+--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
+  self_attn_qkv            : 100.0%
+--- QK SIMILARITY MANIFOLD ---
+ Layer  StableRk        PR    Pos    Neg    SymDev      TopEig
+     0      1.18      9.24    829    835    1.0608       13.79
+     1      2.50     32.71    834    830    1.0916        3.81
+     2      1.63     11.28    831    833    0.8739        2.24
+     3      2.06     13.32    832    832    1.2697        2.45
+     4      1.96     23.28    836    828    1.1835        6.06
+     5      3.96     41.52    839    825    1.0728        4.42
+  ... (48 layers total)
+    47     32.79    637.78    968    696    1.2396        1.92
+  Positive eig fraction: layer 0 = 0.498, last = 0.582
+--- MLP DEAD NEURONS ---
+  Dead (<1% mean): 0/393216 (0.00%)
+  Weak (<10% mean): 24163/393216 (6.14%)
+--- CROSS-LAYER CORRELATION (adjacent pairs) ---
+  self_attn_qkv            : adj_mean=0.0000, adj_range=[-0.0029, 0.0017]
+  mlp_up                   : adj_mean=0.0552, adj_range=[-0.0053, 0.2689]
+======================================================================
+CROSS-MODEL COMPARISON
+======================================================================
+--- Q SPARSITY (<0.1 threshold) ---
+Model                                                 Q         K         V       QKV
+  BERT-large (1024d, 24L, 16H)                    99.1%     99.1%     99.9%       -
+  CLIP-ViT-B/16 LAION (768d, 12L, 12H)              -       -       -    100.0%
+  DINOv2-large (1024d, 24L, 16H)                 100.0%    100.0%    100.0%       -
+  CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H)          -       -       -    100.0%
+  T5-Small (512d, 6L, 8H) [reference]           93.7%   19.2%   12.1%       -
+  T5-Base (768d, 12L, 12H) [reference]          99.4%   30.0%   16.2%       -
+--- SVD STABLE RANK (mean across layers) ---
+Model                                                 Q         K         V    MLP_up
+  BERT-large (1024d, 24L, 16H)                     50.8      37.7     113.0      27.4
+  CLIP-ViT-B/16 LAION (768d, 12L, 12H)              -       -       -      59.7
+  DINOv2-large (1024d, 24L, 16H)                   57.7      55.5      94.8      58.4
+  CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H)          -       -       -      23.1
+--- QK MANIFOLD: POSITIVE EIGENVALUE FRACTION ---
+Model                                             First      Last     Trend
+  BERT-large (1024d, 24L, 16H)                    0.446     0.513    +0.066
+  CLIP-ViT-B/16 LAION (768d, 12L, 12H)            0.503     0.538    +0.035
+  DINOv2-large (1024d, 24L, 16H)                  0.498     0.548    +0.050
+  CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H)        0.498     0.582    +0.084
+--- MLP DEAD NEURONS (<1% of mean) ---
+  BERT-large (1024d, 24L, 16H)               : 0/98304 (0.00%)
+  CLIP-ViT-B/16 LAION (768d, 12L, 12H)       : 1316/36864 (3.57%)
+  DINOv2-large (1024d, 24L, 16H)             : 0/98304 (0.00%)
+  CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H)   : 0/393216 (0.00%)
+======================================================================
+BATTERY COMPLETE
+======================================================================