bpe_glove_512_lora_v1_ffn

Warm-start from jsanzolac/bpe_glove_512_lora_v1/rank_512 plus a per-token FFN inserted between the GloVe-attention output and the alpha-pool collapse.

Trainable: A, B, FFN. Frozen: E, teacher.
Loss: λ_c·InfoNCE + λ_D·‖ρ_T − ρ_S‖²_F with λ_c=1.0, λ_D=0.1.
Density is computed on the post-FFN per-token states; InfoNCE is on the alpha-pooled sentence vector.

Files:

rank_512/checkpoint_final.pt — A + B + FFN state dict (E is non-persistent; re-inject from jsanzolac/bpe_glove_512/vectors.txt).
rank_512/config.json — full hyperparameters.
rank_512/vectors_drifted.txt — E + B(A(·)) per vocab row, GloVe text format. Note: this captures only the static drifted embedding lookup, not the FFN's effect (which is contextual). To use the model end-to-end, instantiate DriftingGloVeStudentFFN and run forward.
rank_512/train_log.jsonl — per-step metrics.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsanzolac/bpe_glove_512_lora_v1_ffn

Base model

jsanzolac/bpe_glove_512

Adapter

(2)

this model

jsanzolac
/

bpe_glove_512_lora_v1_ffn

bpe_glove_512_lora_v1_ffn

Model tree for jsanzolac/bpe_glove_512_lora_v1_ffn

Datasets used to train jsanzolac/bpe_glove_512_lora_v1_ffn