UniVec Benchmark: openai-text-embedding-3-small -> openai-text-embedding-ada-002

A vector conversion model trained on MTEB-aligned data. Published to allow independent comparison against existing vector translation research.

Purpose: research and benchmarking. For production conversion, prefer the corresponding general-release model from the UniVec organization or the hosted API at https://univec.ai.

What is vector conversion?

A corpus embedded with a particular model is bound to that model's vector space: queries must be encoded by the same model for nearest-neighbour search to remain meaningful. Migrating to a different embedder (whether driven by deprecation, an upgrade or a provider change) normally requires re-embedding every document. The cost scales with corpus size and recurs each time the underlying model changes.

A conversion model takes pre-computed source-space vectors and outputs target-space vectors. The training objective is retrieval-order preservation: top-K nearest neighbours in the converted space should align with top-K in the target space despite differences in dimensionality, distance distribution and noise structure.

Why a separate benchmark track?

General-release UniVec converters are trained on broad, heterogeneous corpora to generalise across domains. The metrics on those cards reflect retrieval quality on a generic eval split that mixes many sources.

Models in this track are trained against MTEB-aligned distributions and report numbers directly comparable with published translation benchmarks. The figures below represent the upper bound of direct conversion under controlled benchmark conditions, not what should be expected on arbitrary downstream data.

Self-reported translation results are difficult to verify without access to weights and evaluation protocol. These weights are released to make the comparison reproducible: open weights, open metrics, identical evaluation script.

Evaluation

Metrics on the MTEB-aligned held-out split, comparing converted vectors against ground-truth openai-text-embedding-ada-002 embeddings of the same texts.

Metric Value
MRR 1.0000
P@1 1.0000
P@5 1.0000
P@10 1.0000
Cosine (mean) 0.9582
Cosine (median) 0.9582
Cosine (std) 0.0154
Kendall tau 0.6924

Fields and inference shape are identical to the general-release UniVec models. The only difference is training distribution and intended use.

Training data

Field Value
Training pairs 115,371
Held-out eval pairs 12,819

Inputs and outputs are unit-normalized 2D arrays with shape (batch, dim). The ONNX file is t2s.direct.inn.openai-text-embedding-3-small.openai-text-embedding-ada-002.onnx.

Quick start

Install dependencies

  1. Via uv package manager:
# 1. Install uv (one-time, skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh           # Linux / macOS
# brew install uv                                          # macOS via Homebrew
# powershell -c "irm https://astral.sh/uv/install.ps1 | iex"   # Windows

# 2. Create an isolated venv for this model
uv venv
source .venv/bin/activate                                  # Linux / macOS
# .venv\Scripts\activate                                   # Windows

# 3. Install the CPU inference dependencies pinned in requirements.txt
uv pip install -r requirements.txt

For NVIDIA GPU inference, swap the runtime (don't install both):

uv pip install onnxruntime-gpu numpy

GPU inference requires a working CUDA + cuDNN runtime on the host. The ONNX Runtime CUDA compatibility matrix lists which versions go together.

  1. Plain pip:
pip install -r requirements.txt

Run inference

import numpy as np
import onnxruntime as ort

session = ort.InferenceSession(
    "t2s.direct.inn.openai-text-embedding-3-small.openai-text-embedding-ada-002.onnx",
    providers=["CPUExecutionProvider"],  # or ["CUDAExecutionProvider", "CPUExecutionProvider"]
)
input_name = session.get_inputs()[0].name

# openai-text-embedding-3-small embeddings of an eval set, shape (N, 1536)
embeddings = np.random.randn(8, 1536).astype(np.float32)
embeddings /= np.linalg.norm(embeddings, axis=1, keepdims=True)

converted = session.run(None, {input_name: embeddings})[0]
converted /= np.linalg.norm(converted, axis=1, keepdims=True)

print(converted.shape)  # (N, 1536) in openai-text-embedding-ada-002 space

For batching, GPU execution and .npy / .jsonl file IO, use the companion script univec_inference.py published alongside this model. The requirements.txt file in this repo pins the inference dependencies.

Reproducing the metrics

A self-contained evaluate.py is included in this repo. It runs the converter against a paired evaluation dataset and reports the same metrics shown in the table above (cosine, MRR, P@K, Kendall tau). It is the canonical way to reproduce the published numbers or compare them against a different held-out split.

The expected dataset is JSONL, one record per line, each holding both a source-space and a target-space embedding of the same text:

{"embeddings": {"openai-text-embedding-3-small": [/* 1536-d vector */], "openai-text-embedding-ada-002": [/* 1536-d vector */]}}

Then run:

python evaluate.py \
  --model t2s.direct.inn.openai-text-embedding-3-small.openai-text-embedding-ada-002.onnx \
  --dataset eval.jsonl \
  --source openai-text-embedding-3-small \
  --target openai-text-embedding-ada-002 \
  --output metrics.json

Useful flags:

Flag Default Purpose
--max-samples N all cap the number of pairs evaluated
--device {auto,cuda,cpu} auto pick the ONNX execution provider
--batch-size N 1024 inference batch size
--num-anchors N 1024 number of query anchors used for MRR / P@K
--kendall-subset N 2048 sample size for Kendall tau pairwise rank correlation
--seed N 0 deterministic sampling seed
--output FILE.json none write metrics to JSON for downstream comparison

The script prints a summary table and writes the same numbers to JSON if --output is set. Without scipy installed, Kendall tau is skipped and the other metrics are still reported.

Comparing against prior work

Prior research on translation between embedding spaces has reported baselines on related protocols. The figures here come from a comparable evaluation setup and can be cited alongside or against existing results. Citation and a link back are appreciated for any published comparison.

Intended use and limitations

  • Intended for: research, benchmark reproduction, ablation studies and qualitative inspection of translation behaviour.
  • Not intended for: production retrieval at scale. The matching general-release model is a better fit there.
  • The MTEB alignment of the training distribution makes the eval metrics here optimistic compared to arbitrary user data.

License

Apache 2.0.

Citation

@misc{univec2026,
  author = {UniVec},
  title  = {UniVec: Embedding interoperability for retrieval tasks},
  year   = {2026},
  url    = {https://univec.ai}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support