Sentence Similarity
GGUF
sentence-transformers
English
multilingual
crispembed
embedding
Mixture of Experts
mixture-of-experts
Instructions to use cstr/nomic-embed-text-v2-moe-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cstr/nomic-embed-text-v2-moe-GGUF with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("cstr/nomic-embed-text-v2-moe-GGUF") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
nomic-embed-text-v2-moe GGUF
Quantized GGUF versions of nomic-ai/nomic-embed-text-v2-moe for use with CrispEmbed.
Model Details
- Base model: nomic-ai/nomic-embed-text-v2-moe
- Architecture: NomicBERT with Mixture-of-Experts (MoE)
- 12 layers, 768 hidden, 12 heads
- 8 experts, top-2 routing, GELU activation
- MoE on odd layers (1,3,5,7,9,11), dense GELU FFN on even layers
- RoPE positional encoding, SentencePiece tokenizer (250k vocab)
- Output: 768-dim L2-normalized embeddings
- Context: 2048 tokens max
Available Quantizations
| File | Quantization | Size | Cos vs F32 | Notes |
|---|---|---|---|---|
nomic-v2-moe.gguf |
F32 | 1818 MB | 1.000000 | Full precision, bit-exact vs HuggingFace |
nomic-v2-moe-f16.gguf |
F16 | 1344 MB | 1.000000 | Lossless for this model |
nomic-v2-moe-q8_0.gguf |
Q8_0 | 487 MB | 0.999460 | Near-lossless, recommended |
nomic-v2-moe-q4_k.gguf |
Q4_K | 352 MB | 0.963589 | Aggressive, ranking preserved |
MoE expert weights (3D tensors) are quantized slice-by-slice, achieving full compression across all model weights including the 8-expert FFN layers.
Usage
CrispEmbed CLI
crispembed -m nomic-v2-moe-q8_0.gguf "search_query: What is a mixture of experts?"
CrispEmbed Python
from crispembed import CrispEmbed
model = CrispEmbed("nomic-v2-moe-q8_0.gguf")
embedding = model.encode("search_query: What is a mixture of experts?")
Text Prefixes
Following the original model's convention, prefix queries with search_query: and documents with search_document: for best retrieval performance.
Parity Verification
The F32 GGUF achieves cos = 1.000000 against HuggingFace Transformers across all test texts, verified with the CrispEmbed parity harness:
- Token IDs: exact match (SentencePiece)
- Weight tensors: bit-exact (148/148 tensors, max|delta| = 0.0)
- End-to-end embeddings: cos = 1.000000 on 4 test texts
Conversion
Converted using CrispEmbed's convert-bert-to-gguf.py:
python models/convert-bert-to-gguf.py \
--model nomic-ai/nomic-embed-text-v2-moe \
--output nomic-v2-moe.gguf --crisp
Quantized using crispembed-quantize:
crispembed-quantize nomic-v2-moe.gguf nomic-v2-moe-q8_0.gguf q8_0
License
Apache 2.0, following the base model license.
- Downloads last month
- 608
Hardware compatibility
Log In to add your hardware
Model tree for cstr/nomic-embed-text-v2-moe-GGUF
Base model
FacebookAI/xlm-roberta-base Finetuned
nomic-ai/nomic-xlm-2048 Finetuned
nomic-ai/nomic-embed-text-v2-moe