Edit model card

Модель BERT для расчетов эмбедингов предложений на русском языке. Модель основана на cointegrated/LaBSE-en-ru - имеет аналогичные размеры контекста (512), ембединга (768) и быстродействие.

Использование:

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('sergeyzh/LaBSE-ru-turbo')

sentences = ["привет мир", "hello world", "здравствуй вселенная"]
embeddings = model.encode(sentences)
print(util.dot_score(embeddings, embeddings))

Метрики

Оценки модели на бенчмарке encodechka:

Model CPU GPU size Mean S Mean S+W dim
sergeyzh/LaBSE-ru-turbo 120.40 8.05 490 0.789 0.702 768
BAAI/bge-m3 523.40 22.50 2166 0.787 0.696 1024
intfloat/multilingual-e5-large 506.80 30.80 2136 0.780 0.686 1024
intfloat/multilingual-e5-base 130.61 14.39 1061 0.761 0.669 768
sergeyzh/rubert-tiny-turbo 5.51 3.25 111 0.749 0.667 312
intfloat/multilingual-e5-small 40.86 12.09 449 0.742 0.645 384
cointegrated/LaBSE-en-ru 120.40 8.05 490 0.739 0.667 768
Model STS PI NLI SA TI IA IC ICX NE1 NE2
sergeyzh/LaBSE-ru-turbo 0.864 0.748 0.490 0.814 0.974 0.806 0.815 0.801 0.305 0.404
BAAI/bge-m3 0.864 0.749 0.510 0.819 0.973 0.792 0.809 0.783 0.240 0.422
intfloat/multilingual-e5-large 0.862 0.727 0.473 0.810 0.979 0.798 0.819 0.773 0.224 0.374
intfloat/multilingual-e5-base 0.835 0.704 0.459 0.796 0.964 0.783 0.802 0.738 0.235 0.376
sergeyzh/rubert-tiny-turbo 0.828 0.722 0.476 0.787 0.955 0.757 0.780 0.685 0.305 0.373
intfloat/multilingual-e5-small 0.822 0.714 0.457 0.758 0.957 0.761 0.779 0.691 0.234 0.275
cointegrated/LaBSE-en-ru 0.794 0.659 0.431 0.761 0.946 0.766 0.789 0.769 0.340 0.414
Downloads last month
925
Safetensors
Model size
128M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train sergeyzh/LaBSE-ru-turbo