There is a weird case.

#2
by hbhb - opened

Hi, jhgan :)

I'm doing a project about Korean Job name.
While I test your model, I found a weird case.
This is here.
'''
Source Sentence: λ‚΄ 직업은 νŒμ‚¬μ˜ˆμš”.

Sentences to compare to:
νŒμ‚¬ 0.075
법무사 0.498
ꡐ도관 0.380
λ³€ν˜Έμ‚¬ 0.381
λ‘œνŽŒλŒ€ν‘œ 0.400
'''
νŒμ‚¬ showed way lower score than I expected.

So I relearned the model, just like you did here(https://github.com/jhgan00/ko-sentence-transformers)

And the case was fixed perfectly.
So I guess there was a small mistake uploading this model.

If you are ok, please check this issue.

Thx!

Hello, I've checked out your issue and could have reproduced the case you reported.
However, the benchmark result shows that there is no significant problem in the model (the model achieves about 0.8477 pearson correlation in the STS testset. See benchmark.py for the benchmark script).
I guess because the model is trained to extract sentence level semantics(not word level semantics or lexical features), and it may sometimes fail if no sufficient context is given.
I could get reasonable result using the sentences, not words.

image.png

Maybe you got different result because randomness is not perfectly controlled in my training script (but yours look better to me!).

Thanks.

Sign up or log in to comment