Static Embeddings

This project contains multilingual static embeddings that are appropriate for generating quick embeddings in edge devices. They are re-packaged from other projects in production ready assets.

Models

Updating

Add models to scripts/build_models.py.

# Install dependencies and login to huggingface:
pipx install huggingface_hub
huggingface-cli login

# Re-build the models:
uv run scripts/build_models.py

# Version control:
git add .
git commit -m 'Model updates'
git push
git tag v1.0.0 -m 'Model release description'
git push origin tag v1.0.0

Precision

For static embeddings and cosine similarity, precision isn't as important. For an end to end to test in Firefox on some vectors here was the cosine similarity for the same mean pooled result. Note that the vector math happens in the f32 space, but storage for the embeddings is in a lower precision.

f32 vs f16: cosine similarity = 1.00000000
โ†’ They are essentially identical in direction.

f32 vs f8: cosine similarity = 0.99956375
โ†’ Very close, only tiny quantization effects.

Note that this was done on the torch.float8_e4m3fn, while torch.float8_e5m2 generally has more loss.

Precision also affects download size. For instance with larger minishlab/potion-multilingual-128M/ model. The fp32 is 228M compressed, while only 51M for fp8_e4m3, which has competetive quantization values.

precision dimensions size
fp32 128 228M
fp16 128 114M
fp8_e4m3 128 51M
fp8_e5m2 128 44M
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support