Language

#3
by johnlockejrr - opened

Is the model trained only on Ivrit or Classical Hebrew proper also (Biblical/Tannaitic etc...)?

Israel National NLP Program org

the training data does include CH and is trained on it, but it's portion is probably hidden well under the main Modern Hebrew distribution mass. Having said that, I'd expect it to understand CH relatively well, although I doubt it can generate it.
Dicta reports a better model for that, but I couldn't find its repo. The paper is available from here:
https://arxiv.org/pdf/2309.14568

Just wanted to be sure because I work with CH only. Thank you for clarification!

Sign up or log in to comment