LFM2.5-Embedding-350M โ€” CrispEmbed GGUF

CrispEmbed-native GGUF quantizations of LiquidAI/LFM2.5-Embedding-350M.

Format note: These GGUFs use CrispEmbed's internal tensor naming (lfm.* prefix, arch=lfm2). They are not interchangeable with the official LiquidAI GGUFs which target llama.cpp (lfm2-bidir arch, blk.* tensor naming). Use the LiquidAI GGUFs if you want llama.cpp/llama-server.


Files

File Size Description
lfm2-embed-q8_0.gguf 359 MB 8-bit quantization โ€” best accuracy, recommended
lfm2-embed-q4_k.gguf 222 MB 4-bit K-quant โ€” 3ร— compression, minimal quality loss
lfm2-embed-f16.gguf 678 MB Full fp16 โ€” reference precision

Parity (CrispEmbed q8_0 vs HF float32 Lfm2BidirectionalModel)

Stage Cosine Notes
per-layer (all 20) โ‰ฅ 0.9999 measured on 3-token input via test-lfm2-diff
CLS embedding q8_0 0.9999 5 diverse test sentences
CLS embedding q4_k 0.982 expected q4_k quantization floor

Model

  • Architecture: 16-layer hybrid (10 ShortConv + 6 GQA attention), hidden=1024
  • Pooling: CLS token (position 0) of last hidden state, L2-normalized
  • Dimension: 1024
  • Languages: 11 (en, de, fr, es, it, pt, nl, pl, ru, ja, zh)
  • Parameters: 350M
  • Task prefixes: "query: " for queries, "document: " for passages

Usage with CrispEmbed

CLI

# Download
./crispembed --download lfm2-embed

# Embed a query (prefix auto-applied)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf "What is the capital of France?"

# Embed a document (disable auto-prefix and supply explicitly, or use --prefix)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf \
  --prefix "document: " "Paris is the capital of France."

# JSON output for downstream use
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf --json "query: machine learning"

Python (via crispembed Python bindings)

import crispembed

model = crispembed.load("~/.cache/crispembed/lfm2-embed-q8_0.gguf")

query_emb = model.encode("query: What is the capital of France?")
doc_emb   = model.encode("document: Paris is the capital of France.")

import numpy as np
score = np.dot(query_emb, doc_emb)  # both are already L2-normalized
print(f"Similarity: {score:.4f}")

Rust

use crispembed::CrispEmbed;

let model = CrispEmbed::load("lfm2-embed-q8_0.gguf")?;
let emb = model.encode("query: hello world")?;

Comparison with official LiquidAI GGUFs

This repo LiquidAI/LFM2.5-Embedding-350M-GGUF
Runtime CrispEmbed llama.cpp / llama-server
GGUF arch tag lfm2 lfm2-bidir
Tensor naming lfm.* prefix blk.* / llama.cpp convention
Quantizations f16, q8_0, q4_k BF16, F16, Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0
q8_0 size 359 MB 379 MB
Metal GPU Yes (Apple Silicon) Yes

Conversion

Convert from the source model yourself:

git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed

# Download source
python models/convert-lfm2-embed-to-gguf.py \
    --model LiquidAI/LFM2.5-Embedding-350M \
    --output lfm2-embed-f16.gguf --dtype f16

# Quantize
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q8_0.gguf q8_0
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q4_k.gguf q4_k

License

LFM1.0 โ€” same as the base model.

Downloads last month
-
GGUF
Model size
0.4B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/lfm2-embed-GGUF

Quantized
(2)
this model