Vortex-Embed-4.7M

Vortex-Embed-4.7M is an ultra-lightweight, 4-bit quantized static sentence embedding model designed for high-throughput semantic code search and tool retrieval. Delivering a 256-dimensional space within a 4.7 MB footprint, the model completely bypasses heavy deep learning frameworks like PyTorch or Hugging Face Transformers, making it ideal for edge computing, local IDE plugins, and resource-constrained CLI tools.

This model is deployed as the native, default embedder inside vortexaβ€”the open-source AST-aware codebase indexing and semantic search engine.


⚑ Key Highlights

  • Zero Heavy Dependencies: Built strictly on NumPy, Safetensors, and Tokenizers. No PyTorch, no execution graphs, no CUDA requirements.
  • Aggressive Compression: Compressed 6.4Γ— via LF4 block-quantization while retaining 99.69% cosine similarity relative to the unquantized FP32 baseline.
  • Blazing Fast Execution: Sub-millisecond inference (~0.15ms per text string) with linear search scaling.

πŸ“Š Performance Benchmarks

Quantization Fidelity & Speed

All metrics evaluated on a commodity x86 CPU baseline.

Metric Target Value Notes
Cosine Preservation (vs FP32) 0.9969 Near-zero degradation in vector geometry
Mean Squared Error (MSE) 0.257 Absolute error tracking across the vocabulary
Inference Latency ~0.15ms Per single text encoding execution
Cold Boot / Load Time ~144ms Disk serialization to memory initialization
Local Search Latency 14.6ms P50 latency across 2,707 indexed code chunks
Tool Search Accuracy 100% 15/15 strict functional tool-intent matches

Architectural Efficiency Comparison

Why choose a quantized static embedding over a traditional Transformer-based bi-encoder architecture?

Architectural Feature Vortex-Embed-4.7M (Static) BGE / BERT-Base (Transformer)
Inference Latency πŸš€ 0.15ms ~50.0ms
Cold Start Latency πŸš€ 144ms ~5000ms
On-Disk Footprint πŸš€ 4.7 MB ~400+ MB
Hardware Prerequisite Commodity CPU Dedicated GPU Highly Recommended
Domain Performance Optimized for Code / Tools General Text Semantics

πŸ› οΈ Architecture & Quantization Details

The model utilizes a learned token-to-embedding static matrix combined with custom LF4 per-block quantization. Sentences are processed via tokenization, sequential row-lookup with inline dequantization, mean pooling, and final L2 normalization.

Structural Topology

vocab_size = 29,528 | dimensions = 256 | bits = 4 | block_size = 32

Tensor Layout Matrix

The underlying weights are stored safely inside a standard .safetensors dictionary container:

Tensor Target Data Type Dimensions / Shape Functional Description
embedding_packed uint8 (29528, 128) 4-bit packed array space (stores two 4-bit values per byte)
embedding_scales float16 (29528, 8) High-precision floating-point per-block scale multiplier
embedding_zeros float16 (29528, 8) High-precision floating-point per-block zero-point offset

πŸš€ Quickstart Installation & Usage

Prerequisite Environment

pip install numpy safetensors tokenizers

1. Seamless Codebase Indexing (Via vortexa)

For turnkey directory indexing, search, and MCP support, use the official core engine:

pip install vortexa
from vortexa.core.indexer import CodebaseIndexer

# Native integration: vortexa resolves and loads Vortex-Embed-4.7M out of the box
indexer = CodebaseIndexer(root='.')
stats = indexer.index()

# Execute high-speed vector retrieval across code chunks
results = indexer.search('find CSV parser or file tokenizer', top_k=5)

2. Standalone Low-Level Inference (No Torch Pipeline)

For custom applications or minimal CLI tools requiring zero framework overhead:

from lf4_model import LF4StaticEmbedding

# Streamlined serialization layer
model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')

# Encode source text directly into normalized NumPy arrays
embeddings = model.encode(['search the web', 'read file'])

# High-performance analytical matrix search mapping
scores, indices = model.search(query_emb, doc_emb, top_k=10)

3. Sentence-Transformers Framework Compatibility

If you prefer running within standard ML pipelines, use the modern native static backend:

pip install sentence-transformers
from sentence_transformers import SentenceTransformer

# Load using the explicit static processing engine
model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
embeddings = model.encode(['search the web', 'read file'])

πŸ“œ Citation & Attributions

If you leverage this model or the vortexa engine in technical research, production environments, or industrial applications, please reference the repository utilizing the following BibTeX schema:

@software{vortex-embed-4.7m,
  title  = {Vortex-Embed-4.7M: High-Performance 4-Bit Static Embedding Topology},
  author = {VortexAI},
  year   = {2025},
  url    = {[https://huggingface.co/VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)}
}
Downloads last month
136
Safetensors
Model size
4.25M params
Tensor type
F16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support