bclavie
/

fio-base-japanese-v0.1

Sentence Similarity

sentence-transformers

feature-extraction

Model card Files Files and versions Community

bclavie commited on Dec 19, 2023

Commit

6b4f1ef

•

1 Parent(s): 214d387

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -42,6 +42,8 @@ Retrieval:
 #### Results
 This is adapted and truncated (to keep only the most popular models) from [oshizo's benchmarking github repo](https://github.com/oshizo/JapaneseEmbeddingEval), please check it out for more information and give it a star as it was very useful!
 Italic denotes best model for its size when a smaller model outperforms a bigger one (base/large | 768/1024), bold denotes best overall.

 #### Results
+> ⚠️ WARNING: fio-base-japanese-v0.1 has seen textual entailment tasks during its training, which is _not_ the case of the other other japanese-only models in this table. This gives Fio an unfair advantage over the previous best results, `cl-nagoya/sup-simcse-ja-[base|large]`. During mid-training evaluations, this didn't seem to greatly affect performance, however, JSICK (NLI set) was included in the training data, and therefore it's impossible to fully remove this contamination at the moment. I intend to fix this in future release, but please keep this in mind as you view the results (see JSQuAD results on the associated blog post for a fully unseen comparison, although focused on retrieval).
 This is adapted and truncated (to keep only the most popular models) from [oshizo's benchmarking github repo](https://github.com/oshizo/JapaneseEmbeddingEval), please check it out for more information and give it a star as it was very useful!
 Italic denotes best model for its size when a smaller model outperforms a bigger one (base/large | 768/1024), bold denotes best overall.