ai-forever
/

ru-en-RoSBERTa

@@ -1651,6 +1651,26 @@ print(sim_scores.diag().tolist())
 # [0.47968706488609314, 0.940900444984436, 0.7761018872261047]
 ```
 ## Citation
 ```
@@ -1667,4 +1687,4 @@ print(sim_scores.diag().tolist())
 ## Limitations
-The model is designed to process texts in Russian, the quality in English is unknown. Maximum input text length is limited to 512 tokens.

 # [0.47968706488609314, 0.940900444984436, 0.7761018872261047]
 ```
+or using prompts:
+```python
+from sentence_transformers import SentenceTransformer
+# loads model with CLS pooling
+model = SentenceTransformer("ai-forever/ru-en-RoSBERTa")
+classification = model.encode(["Он нам и <unk> не нужон ваш Интернет!", "What a time to be alive!"], prompt_name="classification")
+print(classification[0] @ classification[1].T) # 0.47968706488609314
+clustering = model.encode(["В Ярославской области разрешили работу бань, но без посетителей", "Ярославским баням разрешили работать без посетителей"], prompt_name="clustering")
+print(clustering[0] @ clustering[1].T) # 0.940900444984436
+query_embedding = model.encode("Сколько программистов нужно, чтобы вкрутить лампочку?", prompt_name="retrieval_query")
+document_embedding = model.encode("Чтобы вкрутить лампочку, требуется три программиста: один напишет программу извлечения лампочки, другой — вкручивания лампочки, а третий проведет тестирование.", prompt_name="retrieval_document")
+print(query_embedding @ document_embedding.T) # 0.7761018872261047
+```
 ## Citation
 ```
 ## Limitations
+The model is designed to process texts in Russian, the quality in English is unknown. Maximum input text length is limited to 512 tokens.

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "prompts": {
+    "classification": "classification: ",
+    "retrieval_query": "search_query: ",
+    "retrieval_document": "search_document: ",
+    "clustering": "clustering: "
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}