Is this model case sensitive?

by Jane001 - opened Aug 31, 2022

Discussion

Jane001

Aug 31, 2022

Is this model case sensitive?

sjrhuschlee

deepset org Aug 31, 2022

Yes, I believe so. A good way to check this is to look at the tokenizer_config.json under Files and versions. In particular, you can see this line

{"do_lower_case": false, "model_max_length": 512, "full_tokenizer_file": null}

where the option do_lower_case is set to false. This means the tokenizer should preserve the cases of the underlying text.

Jane001

Aug 31, 2022

Yes, I believe so. A good way to check this is to look at the tokenizer_config.json under Files and versions. In particular, you can see this line
{"do_lower_case": false, "model_max_length": 512, "full_tokenizer_file": null}
where the option do_lower_case is set to false. This means the tokenizer should preserve the cases of the underlying text.

How to turn it into case insensitive?

sjrhuschlee

deepset org Aug 31, 2022

One option would be to change the parameter in the config file to true. However, I would not recommend this since this model has been trained on case sensitive data so it's performance could differ a lot in a negative way.

Jane001

Sep 5, 2022

•

edited Sep 5, 2022

Agree. It would be great if there is a case-insensitive version, then using the upper or lower cases in the question would not affect the performance. Currently, if the case in the question doesn't match that in the passage, such as using north america instead of North America, the model doesn't give an correct answer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment