LFM2.5-Embedding-350M โ CrispEmbed GGUF
CrispEmbed-native GGUF quantizations of LiquidAI/LFM2.5-Embedding-350M.
Format note: These GGUFs use CrispEmbed's internal tensor naming (lfm.* prefix, arch=lfm2). They are not interchangeable with the official LiquidAI GGUFs which target llama.cpp (lfm2-bidir arch, blk.* tensor naming). Use the LiquidAI GGUFs if you want llama.cpp/llama-server.
Files
| File | Size | Description |
|---|---|---|
lfm2-embed-q8_0.gguf |
359 MB | 8-bit quantization โ best accuracy, recommended |
lfm2-embed-q4_k.gguf |
222 MB | 4-bit K-quant โ 3ร compression, minimal quality loss |
lfm2-embed-f16.gguf |
678 MB | Full fp16 โ reference precision |
Parity (CrispEmbed q8_0 vs HF float32 Lfm2BidirectionalModel)
| Stage | Cosine | Notes |
|---|---|---|
| per-layer (all 20) | โฅ 0.9999 | measured on 3-token input via test-lfm2-diff |
| CLS embedding q8_0 | 0.9999 | 5 diverse test sentences |
| CLS embedding q4_k | 0.982 | expected q4_k quantization floor |
Model
- Architecture: 16-layer hybrid (10 ShortConv + 6 GQA attention), hidden=1024
- Pooling: CLS token (position 0) of last hidden state, L2-normalized
- Dimension: 1024
- Languages: 11 (en, de, fr, es, it, pt, nl, pl, ru, ja, zh)
- Parameters: 350M
- Task prefixes:
"query: "for queries,"document: "for passages
Usage with CrispEmbed
CLI
# Download
./crispembed --download lfm2-embed
# Embed a query (prefix auto-applied)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf "What is the capital of France?"
# Embed a document (disable auto-prefix and supply explicitly, or use --prefix)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf \
--prefix "document: " "Paris is the capital of France."
# JSON output for downstream use
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf --json "query: machine learning"
Python (via crispembed Python bindings)
import crispembed
model = crispembed.load("~/.cache/crispembed/lfm2-embed-q8_0.gguf")
query_emb = model.encode("query: What is the capital of France?")
doc_emb = model.encode("document: Paris is the capital of France.")
import numpy as np
score = np.dot(query_emb, doc_emb) # both are already L2-normalized
print(f"Similarity: {score:.4f}")
Rust
use crispembed::CrispEmbed;
let model = CrispEmbed::load("lfm2-embed-q8_0.gguf")?;
let emb = model.encode("query: hello world")?;
Comparison with official LiquidAI GGUFs
| This repo | LiquidAI/LFM2.5-Embedding-350M-GGUF | |
|---|---|---|
| Runtime | CrispEmbed | llama.cpp / llama-server |
| GGUF arch tag | lfm2 |
lfm2-bidir |
| Tensor naming | lfm.* prefix |
blk.* / llama.cpp convention |
| Quantizations | f16, q8_0, q4_k | BF16, F16, Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0 |
| q8_0 size | 359 MB | 379 MB |
| Metal GPU | Yes (Apple Silicon) | Yes |
Conversion
Convert from the source model yourself:
git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed
# Download source
python models/convert-lfm2-embed-to-gguf.py \
--model LiquidAI/LFM2.5-Embedding-350M \
--output lfm2-embed-f16.gguf --dtype f16
# Quantize
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q8_0.gguf q8_0
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q4_k.gguf q4_k
License
LFM1.0 โ same as the base model.
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for cstr/lfm2-embed-GGUF
Base model
LiquidAI/LFM2.5-350M-Base Finetuned
LiquidAI/LFM2.5-Embedding-350M