sdadas commited on
Commit
d8df271
·
verified ·
1 Parent(s): d605b63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ widget:
22
  MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. The second version is based on the same foundational model ([polish-roberta-large-v2](https://huggingface.co/sdadas/polish-roberta-large-v2)), but the training process incorporated modern LLM-based English retrievers and rerankers, which led to improved results.
23
  This model is optimized for information retrieval tasks. It can transform queries and passages to 1024 dimensional vectors.
24
  The model was developed using a two-step procedure:
25
- - In the first step, we adapted the model for Polish with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) using a diverse corpus of 20 million Polish-English text pairs. We utilised [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) as the teacher models for distillation.
26
  - The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of over 4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker.
27
 
28
 
 
22
  MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. The second version is based on the same foundational model ([polish-roberta-large-v2](https://huggingface.co/sdadas/polish-roberta-large-v2)), but the training process incorporated modern LLM-based English retrievers and rerankers, which led to improved results.
23
  This model is optimized for information retrieval tasks. It can transform queries and passages to 1024 dimensional vectors.
24
  The model was developed using a two-step procedure:
25
+ - In the first step, it was initialized with Polish RoBERTa checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 20 million Polish-English text pairs. We utilised [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) as the teacher models for distillation.
26
  - The second step involved fine-tuning the model with contrastrive loss using a dataset consisting of over 4 million queries. Positive and negative passages for each query have been selected with the help of [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) reranker.
27
 
28