Static Embeddings

This project contains multilingual static embeddings that are appropriate for generating quick embeddings in edge devices. They are re-packaged from other projects in production ready assets.

Models

Updating

Add models to scripts/build_models.py.

# Install dependencies and login to huggingface:
pipx install huggingface_hub
huggingface-cli login

# Re-build the models:
uv run scripts/build_models.py

# Version control:
git add .
git commit -m 'Model updates'
git push
git tag v1.0.0 -m 'Model release description'
git push origin tag v1.0.0

Precision

For static embeddings and cosine similarity, precision isn't as important. For an end to end to test in Firefox on some vectors here was the cosine similarity for the same mean pooled result. Note that the vector math happens in the f32 space, but storage for the embeddings is in a lower precision.

f32 vs f16: cosine similarity = 1.00000000
→ They are essentially identical in direction.

f32 vs f8: cosine similarity = 0.99956375
→ Very close, only tiny quantization effects.

Note that this was done on the torch.float8_e4m3fn, while torch.float8_e5m2 generally has more loss.

Precision also affects download size. For instance with larger minishlab/potion-multilingual-128M/ model. The fp32 is 228M compressed, while only 51M for fp8_e4m3, which has competetive quantization values.

precision	dimensions	size
fp32	128	228M
fp16	128	114M
fp8_e4m3	128	51M
fp8_e5m2	128	44M

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support