--- license: cc-by-4.0 language: - he inference: false --- # DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687). This is the fine-tuned model for the morphological tagging task. For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b). Sample usage: ```python from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-morph') model = AutoModel.from_pretrained('dicta-il/dictabert-morph', trust_remote_code=True) model.eval() sentence = 'בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים' print(model.predict([sentence], tokenizer)) ``` Output: ```json [{ "text": "בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים", "tokens": [{ "token": "בשנת", "pos": "NOUN", "feats": { "Gender": "Fem", "Number": "Sing" }, "prefixes": ["ADP"], "suffix": false }, { "token": "1948", "pos": "NUM", "feats": {}, "prefixes": [], "suffix": false }, { "token": "השלים", "pos": "VERB", "feats": { "Gender": "Masc", "Number": "Sing", "Person": "3", "Tense": "Past" }, "prefixes": [], "suffix": false }, { "token": "אפרים", "pos": "PROPN", "feats": {}, "prefixes": [], "suffix": false }, { "token": "קישון", "pos": "PROPN", "feats": {}, "prefixes": [], "suffix": false }, { "token": "את", "pos": "ADP", "feats": {}, "prefixes": [], "suffix": false }, { "token": "לימודיו", "pos": "NOUN", "feats": { "Gender": "Masc", "Number": "Plur" }, "prefixes": [], "suffix": "PRON", "suffix_feats": { "Gender": "Masc", "Number": "Sing", "Person": "3" } }, { "token": "בפיסול", "pos": "NOUN", "feats": { "Gender": "Masc", "Number": "Sing" }, "prefixes": ["ADP"], "suffix": false }, { "token": "מתכת", "pos": "NOUN", "feats": { "Gender": "Fem", "Number": "Sing" }, "prefixes": [], "suffix": false }, { "token": "ובתולדות", "pos": "NOUN", "feats": { "Gender": "Fem", "Number": "Plur" }, "prefixes": ["CCONJ", "ADP"], "suffix": false }, { "token": "האמנות", "pos": "NOUN", "feats": { "Gender": "Fem", "Number": "Sing" }, "prefixes": ["DET"], "suffix": false }, { "token": "והחל", "pos": "VERB", "feats": { "Gender": "Masc", "Number": "Sing", "Person": "3", "Tense": "Past" }, "prefixes": ["CCONJ"], "suffix": false }, { "token": "לפרסם", "pos": "VERB", "feats": {}, "prefixes": [], "suffix": false }, { "token": "מאמרים", "pos": "NOUN", "feats": { "Gender": "Masc", "Number": "Plur" }, "prefixes": [], "suffix": false }, { "token": "הומוריסטיים", "pos": "ADJ", "feats": { "Gender": "Masc", "Number": "Plur" }, "prefixes": [], "suffix": false }] }] ``` ## Citation If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew``` **BibTeX:** ```bibtex @misc{shmidman2023dictabert, title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel}, year={2023}, eprint={2308.16687}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ## License Shield: [![CC BY 4.0][cc-by-shield]][cc-by] This work is licensed under a [Creative Commons Attribution 4.0 International License][cc-by]. [![CC BY 4.0][cc-by-image]][cc-by] [cc-by]: http://creativecommons.org/licenses/by/4.0/ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg