There is a weird case.
Hi, jhgan :)
I'm doing a project about Korean Job name.
While I test your model, I found a weird case.
This is here.
'''
Source Sentence: λ΄ μ§μ
μ νμ¬μμ.
Sentences to compare to:
νμ¬ 0.075
λ²λ¬΄μ¬ 0.498
κ΅λκ΄ 0.380
λ³νΈμ¬ 0.381
λ‘νλν 0.400
'''
νμ¬ showed way lower score than I expected.
So I relearned the model, just like you did here(https://github.com/jhgan00/ko-sentence-transformers)
And the case was fixed perfectly.
So I guess there was a small mistake uploading this model.
If you are ok, please check this issue.
Thx!
Hello, I've checked out your issue and could have reproduced the case you reported.
However, the benchmark result shows that there is no significant problem in the model (the model achieves about 0.8477
pearson correlation in the STS testset. See benchmark.py
for the benchmark script).
I guess because the model is trained to extract sentence level semantics(not word level semantics or lexical features), and it may sometimes fail if no sufficient context is given.
I could get reasonable result using the sentences, not words.
Maybe you got different result because randomness is not perfectly controlled in my training script (but yours look better to me!).
Thanks.