sdadas
/

mmlw-retrieval-roberta-large

Sentence Similarity

sentence-transformers

feature-extraction

information-retrieval

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

sdadas commited on Nov 26, 2023

Commit

56fecd4

·

1 Parent(s): 0523dfc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -25,6 +25,8 @@ The model was developed using a two-step procedure:
 - In the first step, it was initialized with Polish RoBERTa checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-large-en) as teacher models for distillation.
 - The second step involved fine-tuning the obtained models with contrastrive loss on [Polish MS MARCO](https://huggingface.co/datasets/clarin-knext/msmarco-pl) training split. In order to improve the efficiency of contrastive training, we used large batch sizes - 1152 for small, 768 for base, and 288 for large models. Fine-tuning was conducted on a cluster of 12 A100 GPUs.
 ## Usage (Sentence-Transformers)
 ⚠️ Our dense retrievers require the use of specific prefixes and suffixes when encoding texts. For this model, each query should be preceded by the prefix **"zapytanie: "** ⚠️

 - In the first step, it was initialized with Polish RoBERTa checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-large-en) as teacher models for distillation.
 - The second step involved fine-tuning the obtained models with contrastrive loss on [Polish MS MARCO](https://huggingface.co/datasets/clarin-knext/msmarco-pl) training split. In order to improve the efficiency of contrastive training, we used large batch sizes - 1152 for small, 768 for base, and 288 for large models. Fine-tuning was conducted on a cluster of 12 A100 GPUs.
+⚠️ **2023-12-26:** We have updated the model to a new version with improved results. You can still download the previous version using the **v1** tag: `AutoModel.from_pretrained("sdadas/mmlw-retrieval-roberta-large", revision="v1")` ⚠️
 ## Usage (Sentence-Transformers)
 ⚠️ Our dense retrievers require the use of specific prefixes and suffixes when encoding texts. For this model, each query should be preceded by the prefix **"zapytanie: "** ⚠️