Shaltiel commited on
Commit
e484270
โ€ข
1 Parent(s): 74d5ff2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -1,3 +1,66 @@
1
  ---
2
  license: cc-by-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ language:
4
+ - he
5
  ---
6
+ # DictaBERT-Large: A State-of-the-Art BERT-Large Suite for Modern Hebrew
7
+
8
+ State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
9
+
10
+ This is the BERT-large base model pretrained with the masked-language-modeling objective.
11
+
12
+ For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
13
+
14
+ For the bert-large models for other tasks, see [to-be-added].
15
+
16
+
17
+ Sample usage:
18
+
19
+ ```python
20
+ from transformers import AutoModelForMaskedLM, AutoTokenizer
21
+
22
+ tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-large')
23
+ model = AutoModelForMaskedLM.from_pretrained('dicta-il/dictabert-large')
24
+
25
+ model.eval()
26
+
27
+ sentence = 'ื‘ืฉื ืช 1948 ื”ืฉืœื™ื ืืคืจื™ื ืงื™ืฉื•ืŸ ืืช [MASK] ื‘ืคื™ืกื•ืœ ืžืชื›ืช ื•ื‘ืชื•ืœื“ื•ืช ื”ืืžื ื•ืช ื•ื”ื—ืœ ืœืคืจืกื ืžืืžืจื™ื ื”ื•ืžื•ืจื™ืกื˜ื™ื™ื'
28
+
29
+ output = model(tokenizer.encode(sentence, return_tensors='pt'))
30
+ # the [MASK] is the 7th token (including [CLS])
31
+ import torch
32
+ top_2 = torch.topk(output.logits[0, 7, :], 2)[1]
33
+ print('\n'.join(tokenizer.convert_ids_to_tokens(top_2))) # should print ืžื—ืงืจื• / ื”ืชืžื—ื•ืชื•
34
+
35
+ ```
36
+
37
+
38
+ ## Citation
39
+
40
+ If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
41
+
42
+ **BibTeX:**
43
+
44
+ ```bibtex
45
+ @misc{shmidman2023dictabert,
46
+ title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
47
+ author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
48
+ year={2023},
49
+ eprint={2308.16687},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.CL}
52
+ }
53
+ ```
54
+
55
+ ## License
56
+
57
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
58
+
59
+ This work is licensed under a
60
+ [Creative Commons Attribution 4.0 International License][cc-by].
61
+
62
+ [![CC BY 4.0][cc-by-image]][cc-by]
63
+
64
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
65
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
66
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg