Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ widget:
|
|
| 22 |
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. The second version is based on the same foundational model ([polish-roberta-large-v2](https://huggingface.co/sdadas/polish-roberta-large-v2)), but the training process incorporated modern LLM-based English retrievers and rerankers, which led to improved results.
|
| 23 |
This model is optimized for information retrieval tasks. It can transform queries and passages to 1024 dimensional vectors.
|
| 24 |
The model was developed using a two-step procedure:
|
| 25 |
-
- In the first step,
|
| 26 |
- The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of over 4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker.
|
| 27 |
|
| 28 |
|
|
|
|
| 22 |
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. The second version is based on the same foundational model ([polish-roberta-large-v2](https://huggingface.co/sdadas/polish-roberta-large-v2)), but the training process incorporated modern LLM-based English retrievers and rerankers, which led to improved results.
|
| 23 |
This model is optimized for information retrieval tasks. It can transform queries and passages to 1024 dimensional vectors.
|
| 24 |
The model was developed using a two-step procedure:
|
| 25 |
+
- In the first step, it was initialized with Polish RoBERTa checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 20 million Polish-English text pairs. We utilised [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) as the teacher models for distillation.
|
| 26 |
- The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of over 4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker.
|
| 27 |
|
| 28 |
|