Is this model case sensitive?
Is this model case sensitive?
Yes, I believe so. A good way to check this is to look at the tokenizer_config.json
under Files and versions. In particular, you can see this line
{"do_lower_case": false, "model_max_length": 512, "full_tokenizer_file": null}
where the option do_lower_case
is set to false
. This means the tokenizer should preserve the cases of the underlying text.
Yes, I believe so. A good way to check this is to look at the
tokenizer_config.json
under Files and versions. In particular, you can see this line
{"do_lower_case": false, "model_max_length": 512, "full_tokenizer_file": null}
where the option
do_lower_case
is set tofalse
. This means the tokenizer should preserve the cases of the underlying text.
How to turn it into case insensitive?
One option would be to change the parameter in the config file to true
. However, I would not recommend this since this model has been trained on case sensitive data so it's performance could differ a lot in a negative way.
Agree. It would be great if there is a case-insensitive version, then using the upper or lower cases in the question would not affect the performance. Currently, if the case in the question doesn't match that in the passage, such as using north america instead of North America, the model doesn't give an correct answer.