Update model card with all32 status
Browse files
README.md
CHANGED
|
@@ -131,6 +131,14 @@ Per-layer step-500 mass@K at K=128:
|
|
| 131 |
|
| 132 |
The next run reserves `[0, 1, 2, 35]` and trains layers `3..34`.
|
| 133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
## Positioning against related methods
|
| 135 |
|
| 136 |
The paper frames this method as closest in asymptotic shape to Reformer and
|
|
@@ -152,6 +160,14 @@ superiority. The clean result proves the approach for the six-layer pilot; the
|
|
| 152 |
active all32 reserved-layer run tests whether broad near-whole-model
|
| 153 |
substitution can preserve that quality.
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
## Checkpoints
|
| 156 |
|
| 157 |
Important checkpoint paths in this HF repo:
|
|
|
|
| 131 |
|
| 132 |
The next run reserves `[0, 1, 2, 35]` and trains layers `3..34`.
|
| 133 |
|
| 134 |
+
First diagnostic from the active all32 run:
|
| 135 |
+
|
| 136 |
+
| Step | Recall@K eval | PPL gap | Read |
|
| 137 |
+
|---:|---:|---:|---|
|
| 138 |
+
| 250 | 0.812 | +2.28% | already better than all36 best training eval |
|
| 139 |
+
|
| 140 |
+
This is not a final result; the run is continuing toward step 1000.
|
| 141 |
+
|
| 142 |
## Positioning against related methods
|
| 143 |
|
| 144 |
The paper frames this method as closest in asymptotic shape to Reformer and
|
|
|
|
| 160 |
active all32 reserved-layer run tests whether broad near-whole-model
|
| 161 |
substitution can preserve that quality.
|
| 162 |
|
| 163 |
+
This method targets a different deployment scenario than native
|
| 164 |
+
sliding-window/state-space/hybrid architectures such as Mistral-style sliding
|
| 165 |
+
window, Mamba, or Qwen3.6 Gated DeltaNet hybrids. Those models are trained from
|
| 166 |
+
scratch with their sparse or hybrid mechanism in place. This work is post-hoc:
|
| 167 |
+
train a base model with full attention for maximum expressivity, then add
|
| 168 |
+
lightweight retrieval projections afterward to make inference sub-linear without
|
| 169 |
+
changing base weights.
|
| 170 |
+
|
| 171 |
## Checkpoints
|
| 172 |
|
| 173 |
Important checkpoint paths in this HF repo:
|