Модель BERT для расчетов эмбедингов предложений на русском языке. Модель основана на cointegrated/LaBSE-en-ru - имеет аналогичные размеры контекста (512), ембединга (768) и быстродействие.
Использование:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('sergeyzh/LaBSE-ru-turbo')
sentences = ["привет мир", "hello world", "здравствуй вселенная"]
embeddings = model.encode(sentences)
print(util.dot_score(embeddings, embeddings))
Метрики
Оценки модели на бенчмарке encodechka:
Model | CPU | GPU | size | Mean S | Mean S+W | dim |
---|---|---|---|---|---|---|
sergeyzh/LaBSE-ru-turbo | 120.40 | 8.05 | 490 | 0.789 | 0.702 | 768 |
BAAI/bge-m3 | 523.40 | 22.50 | 2166 | 0.787 | 0.696 | 1024 |
intfloat/multilingual-e5-large | 506.80 | 30.80 | 2136 | 0.780 | 0.686 | 1024 |
intfloat/multilingual-e5-base | 130.61 | 14.39 | 1061 | 0.761 | 0.669 | 768 |
sergeyzh/rubert-tiny-turbo | 5.51 | 3.25 | 111 | 0.749 | 0.667 | 312 |
intfloat/multilingual-e5-small | 40.86 | 12.09 | 449 | 0.742 | 0.645 | 384 |
cointegrated/LaBSE-en-ru | 120.40 | 8.05 | 490 | 0.739 | 0.667 | 768 |
Model | STS | PI | NLI | SA | TI | IA | IC | ICX | NE1 | NE2 |
---|---|---|---|---|---|---|---|---|---|---|
sergeyzh/LaBSE-ru-turbo | 0.864 | 0.748 | 0.490 | 0.814 | 0.974 | 0.806 | 0.815 | 0.801 | 0.305 | 0.404 |
BAAI/bge-m3 | 0.864 | 0.749 | 0.510 | 0.819 | 0.973 | 0.792 | 0.809 | 0.783 | 0.240 | 0.422 |
intfloat/multilingual-e5-large | 0.862 | 0.727 | 0.473 | 0.810 | 0.979 | 0.798 | 0.819 | 0.773 | 0.224 | 0.374 |
intfloat/multilingual-e5-base | 0.835 | 0.704 | 0.459 | 0.796 | 0.964 | 0.783 | 0.802 | 0.738 | 0.235 | 0.376 |
sergeyzh/rubert-tiny-turbo | 0.828 | 0.722 | 0.476 | 0.787 | 0.955 | 0.757 | 0.780 | 0.685 | 0.305 | 0.373 |
intfloat/multilingual-e5-small | 0.822 | 0.714 | 0.457 | 0.758 | 0.957 | 0.761 | 0.779 | 0.691 | 0.234 | 0.275 |
cointegrated/LaBSE-en-ru | 0.794 | 0.659 | 0.431 | 0.761 | 0.946 | 0.766 | 0.789 | 0.769 | 0.340 | 0.414 |
- Downloads last month
- 925
This model does not have enough activity to be deployed to Inference API (serverless) yet.
Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.