LiquidAI
/

LFM2.5-ColBERT-350M

Sentence Similarity

sentence-transformers

feature-extraction

Model card Files Files and versions

Fix short-conv padding masking on transformers >=5

#1

by Satyen - opened 6 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

initial commitca3550b1

initial release: LFM2 ColBERT (bidi-patched), from train_1369809/best95510753

Update weights to checkpoint train_1811379_colbert-ft-s2/best (multilingual NanoBEIR ndcg@10 0.5854 -> 0.6044)d62e5a5a

Add FA2-capable modeling file but disable flash_attention_2 via config (incompatible with PyLate query expansion; ~10 ndcg@10 loss)293dacd0

readme: update repo id to LFM2.5-ColBERT-350M2095737a

readme: full model card with bidi/LFM2.5 details and PyLate usage40186115

readme: intro both models; mark shared architecture/training sectionsf3c261aa

readme: shorter intro + side-by-side specs table; drop arch/training (will live in blog)e380c6a4

readme: drop 'this model' qualifiers; move RAG use cases under Model details12ec4bf6

readme: move use cases below the specs table and module reprc682785b

readme: swap banner image to 2b08LKpev0DNEk6DlnWkY.png57649f68

readme: add NanoBEIR Multilingual Extended benchmark tables (NDCG@10 + second metric)6f2c0d5f

readme: label second benchmark as MKQA Recall@201a96e819

readme: add LFM2 tech report citation57f6d036

readme: add sales contact link00aa1a88

Update README.md632d72a8

Update README.md94d86b57

Update README.md0c2d5381

Update README.mda54a3d74

Update README.mdb77982de

readme: enrich How-to-run with the original LFM2-ColBERT-350M prose + step-comments7953fb84

readme: drop trust_remote_code callout9d5406c4

fix: accept input_embeds/inputs_embeds kwarg rename across transformers versionsa90f6dc0

Update README.md7cd22063

Update README.mdeeb070da

Update README.md18df1d51

Update README.mdae034743

Create LICENSE1946bdfc

Update README.md8f435c15

Update README.mde2abfcac

Update README.mdb0125af8

Update README.md1d67ee45

Update README.md90c71847

readme: add inference-speed table (llama.cpp, M4 Max, fp16)ff12c5df

readme: speed table — remove bold and add 32q/256d setup notefbe86074

Update README.mdca4ca7e0

Update README.mdc0c24306

readme: add Enterprise GPU serving section (image + p50/p95/p99 table)faf602d8

readme: split inference-speed sections (llama.cpp vs Enterprise GPU)1e1d480e

Update README.md7407b379

readme: stack GPU plot above the latency table instead of side-by-sidec6904f7a

Update README.md466cacb1

cast weights to bf16 (lossless from training precision)4641209a

cast dense head to bf168ffefb6f

config: dtype bfloat164b361060

Update README.md8b3288d1

Liquid AI org 6 days ago

•

edited 6 days ago

On transformers >=5, Lfm2Model.forward routes the raw 2D padding mask (not the
4D additive mask) to short-conv layers. The shipped _noncausal_shortconv_forward
then runs apply_mask_to_padding_states, which is a no-op on a 4D mask (the 4.56
path the checkpoint was trained with) but zeroes padding/query-expansion states
on a 2D mask, shifting per-token embeddings ColBERT scores in MaxSim.

Fix: gate the masking to the flash_attention_2 path only; eager/sdpa match
training behavior on every transformers version.

en NanoBEIR NDCG@10 (identical eval stack):
transformers 5.3.0 fp32 unfixed 0.6506 -> fixed 0.6863 (= card 0.687)
transformers 5.3.0 bf16 unfixed 0.6412 -> fixed 0.6771

Fix short-conv padding masking on transformers >=56f3bb287

EdoardoMosca changed pull request status to merged 6 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment