Not detecting properly

#2
by Rrila - opened

As stated on the other model, but just to keep record of it on this one:

It cant detect properly this sentence:

Hallo, Guten Tag how are you? calm down dices tu nombre

To answer your first question if I will be adding more languages: there is no plan to add new languages till now as it will require heavy computational resources (which I do not have, unfortunately) for a larger dataset.

For the second question: As I have understood from my output that your concern is "calm down" not being predicted as English.
From my point of view, the reason might be "calm down" is also quite common to be used in the Spanish language as well. So, it is possible that the model is considering these two tokens as Spanish by considering the context of the whole sequence (calm down dices tu nombre). In other words, the model is detecting a "sequence of tokens" from one single language (in this case, Spanish) even with tokens which are seeming to be from another language (in this case, English) but are also commonly used in that single language (Spanish). That's why if you use "calm down" along with some other English tokens, "calm down" will be predicted as English, not Spanish.
On the other hand, "Hallo, Guten Tag" and "how are you?" is not quite common to be used together. To elaborate, it is not common to find "Hallo, Guten Tag" in the English language or "how are you?" in the German language. That's why the model was able to distinguish them perfectly.

I hope it helps. Thanks.

Hi, just in case it can help you, I'm native Spanish (from Spain) and that is definitely, not only not Spanish, but not even commonly use in Spain. It might be the data set used for training the model was Mexican, or some short of other Spanish variation. So, although it might not have a solution so to speak, it is clearly wrong, as Spanish from Spain is and should be the only tagged as proper Spanish (not trying to insult anyone, just clarifying)

Well, then, it's probably one of the model's limitations.

msislam changed discussion status to closed

Sign up or log in to comment