Instructions to use cstr/Octen-Embedding-0.6B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cstr/Octen-Embedding-0.6B-GGUF with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("cstr/Octen-Embedding-0.6B-GGUF") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Octen-Embedding-0.6B โ GGUF
GGUF conversion of Octen/Octen-Embedding-0.6B for use with CrispEmbed.
Model Details
- Architecture: Qwen3 decoder with GQA (16 Q heads, 8 KV heads, head_dim=128)
- Parameters: 0.6B (28 layers, 1024 hidden, 3072 intermediate)
- Embedding dim: 1024
- Pooling: Last-token
- Tokenizer: GPT-2 BPE (151K vocab)
- RoPE: theta=1,000,000
- License: Apache-2.0
Files
| File | Type | Size | CosSim vs HF |
|---|---|---|---|
octen-0.6b-f32.gguf |
F32 | 2.3 GB | 0.9999 |
octen-0.6b-q8_0.gguf |
Q8_0 | 609 MB | 0.9993 |
octen-0.6b-q4_k.gguf |
Q4_K | 325 MB | 0.9570 |
Usage with CrispEmbed
./crispembed -m octen-0.6b-q8_0.gguf "Hello world"
# prints 1024-dim L2-normalized embedding
# Server mode
./crispembed-server -m octen-0.6b-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/embed -d '{"texts": ["Hello world"]}'
Conversion
Converted from the original PyTorch model using models/convert-decoder-embed-to-gguf.py from the CrispEmbed repo. Verified bit-identical (cosโฅ0.999) to HuggingFace sentence-transformers output.
- Downloads last month
- 21
Hardware compatibility
Log In to add your hardware
8-bit
32-bit
Model tree for cstr/Octen-Embedding-0.6B-GGUF
Base model
Qwen/Qwen3-0.6B-Base Finetuned
Qwen/Qwen3-Embedding-0.6B Finetuned
Octen/Octen-Embedding-0.6B