Vortex-Embed v2

Retrieval-optimized 4-bit static embeddings for code search.

Built on VTXAI/Vortex-Embed-4.7M (29528 vocab ร— 256 dim, 4-bit LF4 packed = 4.7 MB on disk) with a set of training-free retrieval upgrades that lift R@1 from 0.314 โ†’ 0.745 on the Webscout codebase benchmark (51 hand-verified code queries, 5,168 chunks across 349 files).

What changed vs the v1 model

All four upgrades are inference-time only โ€” the underlying 4-bit weights are bit-identical to the v1 artifact. They are:

  1. SIF IDF weighting. Each token's contribution is scaled by a / (a + p(t)) where p(t) is its corpus frequency. Common tokens ("import", "def", "class") are down-weighted; rare tokens are amplified.
  2. Top-8 principal component removal. The dominant common-topic direction of the corpus is fitted once via SVD and projected out of every chunk/query vector (Arora et al. 2017).
  3. File-path header injection. Before encoding each chunk, its file path tokens (e.g. model_fetcher, search, engines) are prepended ร—15. The file name effectively becomes a "tag" the chunk retrieves on.
  4. Search-time file-extension score bias. Within the top-50 dense candidates, .py chunks get +0.05 and .md chunks get -0.02. This fixes the common failure where README.md and docs/*.md outrank the actual code (higher topic overlap but lower action relevance).

Benchmark

Corpus: 5,168 chunks ร— 256-dim across 349 files in the Webscout codebase. Queries: 51 hand-verified natural-language โ†’ file-path pairs.

Model R@1 R@5 R@10 MRR enc@1 enc@64 search@64
Vortex-Embed v1 (baseline) 0.314 0.667 0.863 0.478 6.2 ms 227 ms 4.2 ms
Vortex-Embed v2 (this) 0.745 0.843 0.882 0.779 6.4 ms 107 ms 9.1 ms

+137% R@1, +63% MRR. Encode of 64 chunks is 2.1ร— faster thanks to the same torch.scatter_add_ (ATen) and sorted reduceat kernels used in v1.

Usage

from huggingface_hub import snapshot_download
from lf4_v2 import VortexEmbedV2

# Download model + tokenizer + config
path = snapshot_download("VTXAI/Vortex-Embed-v2")

# Load
model = VortexEmbedV2.from_pretrained(path)
print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB")

# Single-query encode
vec = model.encode("find python json parser", normalize=True)
# vec.shape == (256,)

# Batch encode
docs = [
    "def parse_json(s): return json.loads(s)",
    "class WeatherAPI: pass",
    "import requests",
]
doc_embs = model.encode(docs, normalize=True)  # (3, 256)

# Search
import numpy as np
scores, indices = model.search(vec, doc_embs, top_k=3)
# scores.shape == (1, 3), indices.shape == (1, 3)

Codebase retrieval (the real use case)

from pathlib import Path
from lf4_v2 import VortexEmbedV2

# 1. Chunk a codebase (line-based, 40 lines/chunk, 5 line overlap)
chunks, texts = [], []
for path in Path("./src").rglob("*.py"):
    for i, line in enumerate(path.read_text().splitlines()):
        chunk_start = max(0, i - 40)
        chunk = "\n".join(path.read_text().splitlines()[chunk_start:i+5])
        chunks.append((str(path), chunk_start, chunk))
        texts.append(chunk)

# 2. Load + bind paths (this enables file-path header injection)
model = VortexEmbedV2.from_pretrained("VTXAI/Vortex-Embed-v2")
model.set_file_paths([c[0] for c in chunks])  # critical for v2 quality

# 3. Fit IDF on the corpus (one-time, ~200 ms)
token_lists = [model.tokenizer.encode(t).ids for t in texts]
model.fit_idf(token_lists)

# 4. Encode corpus
import_emb = model.encode_batch(texts, normalize=True)  # (n, 256)

# 5. Fit top-K PC on the corpus (one-time, ~300 ms)
model.fit_pc(import_emb, k=8)

# 6. Re-encode with PC removal applied
import_emb = model.encode_batch(texts, normalize=True)

# 7. Query
query = "where do we parse JSON requests"
q_emb = model.encode(query, normalize=True)
scores, indices = model.search(q_emb, import_emb, top_k=10)
for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1):
    file, line, text = chunks[i]
    print(f"#{rank} ({s:.3f}) {file}:{line}")

Configuration knobs

All retrieval hyperparameters live in config.json and can be overridden at load time:

model = VortexEmbedV2.from_pretrained(
    "VTXAI/Vortex-Embed-v2",
    sif_a=1e-3,           # SIF smoothing (lower = sharper)
    pc_k=0,               # disable PC removal
    header_repeat=10,     # reduce path-header weight
    py_bonus=0.0,         # disable extension bias
)
Knob Default Effect
sif_a 1e-4 SIF smoothing. Lower = sharper IDF weighting
pc_k 8 Number of principal components to remove
sif_pc 1.0 PC removal strength (0 = disabled)
header_repeat 15 How many times to repeat path-header tokens
py_bonus 0.05 Score boost for .py chunks in top-50
md_penalty -0.02 Score penalty for .md chunks in top-50
bias_top_k 50 Candidate pool size for the bias

Files

  • model.safetensors โ€” 4-bit LF4 packed weights (3.7 MB)
  • embedding_scales (FP16), embedding_zeros (FP16) โ€” per-block quantization params
  • config.json โ€” model + retrieval config
  • tokenizer.json โ€” HuggingFace fast tokenizer (29 KB)
  • lf4_v2.py โ€” self-contained model class (drop-in to any project)

Citation

The SIF/PC technique is from:

Arora, Liang, Ma (2017). A Simple but Tough-to-Beat Baseline for Sentence Embeddings. ICLR.

The LF4 quantization is from:

Original Vortex-Embed-4.7M model card on VTXAI/Vortex-Embed-4.7M.

If you use v2 in research, please cite the original Vortex-Embed paper and this AutoResearch loop (see Vortex-AutoResearch).

Downloads last month
26
Safetensors
Model size
4.25M params
Tensor type
F16
ยท
U8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support