Sentence Similarity
Safetensors
Japanese
RAGatouille
bert
ColBERT
bclavie commited on
Commit
c5f384c
1 Parent(s): e289641

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -16
README.md CHANGED
@@ -26,7 +26,7 @@ If you just want to check out how to use the model, please check out the [Usage
26
 
27
  Welcome to JaColBERT version 2, the second release of JaColBERT, a Japanese-only document retrieval model based on [ColBERT](https://github.com/stanford-futuredata/ColBERT).
28
 
29
- JaColBERTv2 is a model that offers very strong out-of-domain generalisation. Having been only trained on a single dataset (MMarco), it reaches state-of-the-art performance on all evaluated datasets, matching or outperforming models all previous models, despite them having been previously exposed to the evaluation tasks and being considerably larger models.
30
 
31
  JaColBERTv2 was initialised off JaColBERTv1 and trained using knowledge distillation with 31 negative examples per positive example. It was trained for 250k steps using a batch size of 32.
32
 
@@ -66,21 +66,22 @@ We present the first results, on two datasets: JQaRa, a passage retrieval task c
66
 
67
  JaColBERTv2 reaches state-of-the-art results on both datasets, outperforming models with 5x more parameters.
68
 
69
- | | | | JQaRa | | | | JSQuAD | | |
70
- | ---------------------- | --- | --------- | --------- | --------- | --------- | --- | --------- | --------- | --------- |
71
- | | | NDCG@10 | MRR@10 | NDCG@100 | MRR@100 | | R@1 | R@5 | R@10 |
72
- | JaColBERTv2 | | **0.585** | **0.836** | **0.753** | **0.838** | | **0.918** | **0.975** | **0.982** |
73
- | JaColBERT | | 0.549 | 0.811 | 0.730 | 0.814 | | 0.906 | 0.968 | 0.978 |
74
- | bge-m3+all | | 0.576 | 0.818 | 0.745 | 0.820 | | N/A | N/A | N/A |
75
- | bg3-m3+dense | | 0.539 | 0.785 | 0.721 | 0.788 | | N/A | N/A | N/A |
76
- | m-e5-large | | 0.554 | 0.799 | 0.731 | 0.801 | | 0.865 | 0.966 | 0.977 |
77
- | m-e5-base | | 0.471 | 0.727 | 0.673 | 0.731 | | *0.838* | *0.955* | 0.973 |
78
- | m-e5-small | | 0.492 | 0.729 | 0.689 | 0.733 | | *0.840* | *0.954* | 0.973 |
79
- | GLuCoSE | | 0.308 | 0.518 | 0.564 | 0.527 | | 0.645 | 0.846 | 0.897 |
80
- | sup-simcse-ja-base | | 0.324 | 0.541 | 0.572 | 0.550 | | 0.632 | 0.849 | 0.897 |
81
- | sup-simcse-ja-large | | 0.356 | 0.575 | 0.596 | 0.583 | | 0.603 | 0.833 | 0.889 |
82
- | fio-base-v0.1 | | 0.372 | 0.616 | 0.608 | 0.622 | | 0.700 | 0.879 | 0.924 |
83
- | | | | | | | | | | |
 
84
 
85
 
86
  # Usage
 
26
 
27
  Welcome to JaColBERT version 2, the second release of JaColBERT, a Japanese-only document retrieval model based on [ColBERT](https://github.com/stanford-futuredata/ColBERT).
28
 
29
+ JaColBERTv2 is a model that offers very strong out-of-domain generalisation. Having been only trained on a single dataset (MMarco), it reaches state-of-the-art performance.
30
 
31
  JaColBERTv2 was initialised off JaColBERTv1 and trained using knowledge distillation with 31 negative examples per positive example. It was trained for 250k steps using a batch size of 32.
32
 
 
66
 
67
  JaColBERTv2 reaches state-of-the-art results on both datasets, outperforming models with 5x more parameters.
68
 
69
+
70
+ | | | | JQaRa | | | | JSQuAD | | |
71
+ | ------------------- | --- | --------- | --------- | --------- | --------- | --- | --------- | --------- | --------- |
72
+ | | | NDCG@10 | MRR@10 | NDCG@100 | MRR@100 | | R@1 | R@5 | R@10 |
73
+ | JaColBERTv2 | | **0.585** | **0.836** | **0.753** | **0.838** | | **0.918** | **0.975** | **0.982** |
74
+ | JaColBERT | | 0.549 | 0.811 | 0.730 | 0.814 | | 0.906 | 0.968 | 0.978 |
75
+ | bge-m3+all | | 0.576 | 0.818 | 0.745 | 0.820 | | N/A | N/A | N/A |
76
+ | bg3-m3+dense | | 0.539 | 0.785 | 0.721 | 0.788 | | 0.850 | 0.959 | 0.976 |
77
+ | m-e5-large | | 0.554 | 0.799 | 0.731 | 0.801 | | 0.865 | 0.966 | 0.977 |
78
+ | m-e5-base | | 0.471 | 0.727 | 0.673 | 0.731 | | *0.838* | *0.955* | 0.973 |
79
+ | m-e5-small | | 0.492 | 0.729 | 0.689 | 0.733 | | *0.840* | *0.954* | 0.973 |
80
+ | GLuCoSE | | 0.308 | 0.518 | 0.564 | 0.527 | | 0.645 | 0.846 | 0.897 |
81
+ | sup-simcse-ja-base | | 0.324 | 0.541 | 0.572 | 0.550 | | 0.632 | 0.849 | 0.897 |
82
+ | sup-simcse-ja-large | | 0.356 | 0.575 | 0.596 | 0.583 | | 0.603 | 0.833 | 0.889 |
83
+ | fio-base-v0.1 | | 0.372 | 0.616 | 0.608 | 0.622 | | 0.700 | 0.879 | 0.924 |
84
+ | | | | | | | | | | |
85
 
86
 
87
  # Usage