jhgan
/

ko-sroberta-multitask

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

There is a weird case.

#2

by hbhb - opened Sep 26, 2022

hbhb

Sep 26, 2022

Hi, jhgan :)

I'm doing a project about Korean Job name.
While I test your model, I found a weird case.
This is here.
'''
Source Sentence: 내 직업은 판사예요.

Sentences to compare to:
판사 0.075
법무사 0.498
교도관 0.380
변호사 0.381
로펌대표 0.400
'''
판사 showed way lower score than I expected.

So I relearned the model, just like you did here(https://github.com/jhgan00/ko-sentence-transformers)

And the case was fixed perfectly.
So I guess there was a small mistake uploading this model.

If you are ok, please check this issue.

Thx!

jhgan

Owner Sep 26, 2022

Hello, I've checked out your issue and could have reproduced the case you reported.
However, the benchmark result shows that there is no significant problem in the model (the model achieves about 0.8477 pearson correlation in the STS testset. See benchmark.py for the benchmark script).
I guess because the model is trained to extract sentence level semantics(not word level semantics or lexical features), and it may sometimes fail if no sufficient context is given.
I could get reasonable result using the sentences, not words.

Maybe you got different result because randomness is not perfectly controlled in my training script (but yours look better to me!).

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment