nomic-embed-text-v2-moe GGUF

Quantized GGUF versions of nomic-ai/nomic-embed-text-v2-moe for use with CrispEmbed.

Model Details

  • Base model: nomic-ai/nomic-embed-text-v2-moe
  • Architecture: NomicBERT with Mixture-of-Experts (MoE)
    • 12 layers, 768 hidden, 12 heads
    • 8 experts, top-2 routing, GELU activation
    • MoE on odd layers (1,3,5,7,9,11), dense GELU FFN on even layers
    • RoPE positional encoding, SentencePiece tokenizer (250k vocab)
  • Output: 768-dim L2-normalized embeddings
  • Context: 2048 tokens max

Available Quantizations

File Quantization Size Cos vs F32 Notes
nomic-v2-moe.gguf F32 1818 MB 1.000000 Full precision, bit-exact vs HuggingFace
nomic-v2-moe-f16.gguf F16 1344 MB 1.000000 Lossless for this model
nomic-v2-moe-q8_0.gguf Q8_0 487 MB 0.999460 Near-lossless, recommended
nomic-v2-moe-q4_k.gguf Q4_K 352 MB 0.963589 Aggressive, ranking preserved

MoE expert weights (3D tensors) are quantized slice-by-slice, achieving full compression across all model weights including the 8-expert FFN layers.

Usage

CrispEmbed CLI

crispembed -m nomic-v2-moe-q8_0.gguf "search_query: What is a mixture of experts?"

CrispEmbed Python

from crispembed import CrispEmbed

model = CrispEmbed("nomic-v2-moe-q8_0.gguf")
embedding = model.encode("search_query: What is a mixture of experts?")

Text Prefixes

Following the original model's convention, prefix queries with search_query: and documents with search_document: for best retrieval performance.

Parity Verification

The F32 GGUF achieves cos = 1.000000 against HuggingFace Transformers across all test texts, verified with the CrispEmbed parity harness:

  • Token IDs: exact match (SentencePiece)
  • Weight tensors: bit-exact (148/148 tensors, max|delta| = 0.0)
  • End-to-end embeddings: cos = 1.000000 on 4 test texts

Conversion

Converted using CrispEmbed's convert-bert-to-gguf.py:

python models/convert-bert-to-gguf.py \
    --model nomic-ai/nomic-embed-text-v2-moe \
    --output nomic-v2-moe.gguf --crisp

Quantized using crispembed-quantize:

crispembed-quantize nomic-v2-moe.gguf nomic-v2-moe-q8_0.gguf q8_0

License

Apache 2.0, following the base model license.

Downloads last month
608
GGUF
Model size
0.5B params
Architecture
bert
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/nomic-embed-text-v2-moe-GGUF