nomic-embed-text-v2-moe GGUF

Quantized GGUF versions of nomic-ai/nomic-embed-text-v2-moe for use with CrispEmbed.

Model Details

Base model: nomic-ai/nomic-embed-text-v2-moe
Architecture: NomicBERT with Mixture-of-Experts (MoE)
- 12 layers, 768 hidden, 12 heads
- 8 experts, top-2 routing, GELU activation
- MoE on odd layers (1,3,5,7,9,11), dense GELU FFN on even layers
- RoPE positional encoding, SentencePiece tokenizer (250k vocab)
Output: 768-dim L2-normalized embeddings
Context: 2048 tokens max

Available Quantizations

File	Quantization	Size	Cos vs F32	Notes
`nomic-v2-moe.gguf`	F32	1818 MB	1.000000	Full precision, bit-exact vs HuggingFace
`nomic-v2-moe-f16.gguf`	F16	1344 MB	1.000000	Lossless for this model
`nomic-v2-moe-q8_0.gguf`	Q8_0	487 MB	0.999460	Near-lossless, recommended
`nomic-v2-moe-q4_k.gguf`	Q4_K	352 MB	0.963589	Aggressive, ranking preserved

MoE expert weights (3D tensors) are quantized slice-by-slice, achieving full compression across all model weights including the 8-expert FFN layers.

Usage

CrispEmbed CLI

crispembed -m nomic-v2-moe-q8_0.gguf "search_query: What is a mixture of experts?"

CrispEmbed Python

from crispembed import CrispEmbed

model = CrispEmbed("nomic-v2-moe-q8_0.gguf")
embedding = model.encode("search_query: What is a mixture of experts?")

Text Prefixes

Following the original model's convention, prefix queries with search_query: and documents with search_document: for best retrieval performance.

Parity Verification

The F32 GGUF achieves cos = 1.000000 against HuggingFace Transformers across all test texts, verified with the CrispEmbed parity harness:

Token IDs: exact match (SentencePiece)
Weight tensors: bit-exact (148/148 tensors, max|delta| = 0.0)
End-to-end embeddings: cos = 1.000000 on 4 test texts

Conversion

Converted using CrispEmbed's convert-bert-to-gguf.py:

python models/convert-bert-to-gguf.py \
    --model nomic-ai/nomic-embed-text-v2-moe \
    --output nomic-v2-moe.gguf --crisp

Quantized using crispembed-quantize:

crispembed-quantize nomic-v2-moe.gguf nomic-v2-moe-q8_0.gguf q8_0

License

Apache 2.0, following the base model license.

Downloads last month: 608

GGUF

Model size

0.5B params

Architecture

bert

Hardware compatibility

8-bit

16-bit

View +1 variant

Model tree for cstr/nomic-embed-text-v2-moe-GGUF

Base model

FacebookAI/xlm-roberta-base

Finetuned

nomic-ai/nomic-xlm-2048

Finetuned

nomic-ai/nomic-embed-text-v2-moe-unsupervised

Finetuned

nomic-ai/nomic-embed-text-v2-moe

Quantized

(13)

this model