File size: 3,125 Bytes
22128a4
 
f13834d
 
22128a4
f13834d
 
 
 
d35eaa2
f13834d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bb032bf
f13834d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: cc-by-4.0
language:
- he
---
# DictaBERT-Large: A State-of-the-Art BERT-Large Suite for Modern Hebrew

State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).

This is the fine-tuned BERT-large model for the question-answering task using the [HeQ](https://u.cs.biu.ac.il/~yogo/heq.pdf) dataset.

For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).

Sample usage:

```python
from transformers import pipeline

oracle = pipeline('question-answering', model='dicta-il/dictabert-large-heq')


context = 'ื‘ื ื™ื™ืช ืคืจื•ืคื™ืœื™ื ืฉืœ ืžืฉืชืžืฉื™ื ื ื—ืฉื‘ืช ืขืœ ื™ื“ื™ ืจื‘ื™ื ื›ืื™ื•ื ืคื•ื˜ื ืฆื™ืืœื™ ืขืœ ื”ืคืจื˜ื™ื•ืช. ืžืกื™ื‘ื” ื–ื• ื”ื’ื‘ื™ืœื• ื—ืœืง ืžื”ืžื“ื™ื ื•ืช ื‘ืืžืฆืขื•ืช ื—ืงื™ืงื” ืืช ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ืœื”ืฉื™ื’ ื‘ืืžืฆืขื•ืช ืขื•ื’ื™ื•ืช ื•ืืช ืื•ืคืŸ ื”ืฉื™ืžื•ืฉ ื‘ืขื•ื’ื™ื•ืช. ืืจืฆื•ืช ื”ื‘ืจื™ืช, ืœืžืฉืœ, ืงื‘ืขื” ื—ื•ืงื™ื ื ื•ืงืฉื™ื ื‘ื›ืœ ื”ื ื•ื’ืข ืœื™ืฆื™ืจืช ืขื•ื’ื™ื•ืช ื—ื“ืฉื•ืช. ื—ื•ืงื™ื ืืœื•, ืืฉืจ ื ืงื‘ืขื• ื‘ืฉื ืช 2000, ื ืงื‘ืขื• ืœืื—ืจ ืฉื ื—ืฉืฃ ื›ื™ ื”ืžืฉืจื“ ืœื™ื™ืฉื•ื ื”ืžื“ื™ื ื™ื•ืช ืฉืœ ื”ืžืžืฉืœ ื”ืืžืจื™ืงืื™ ื ื’ื“ ื”ืฉื™ืžื•ืฉ ื‘ืกืžื™ื (ONDCP) ื‘ื‘ื™ืช ื”ืœื‘ืŸ ื”ืฉืชืžืฉ ื‘ืขื•ื’ื™ื•ืช ื›ื“ื™ ืœืขืงื•ื‘ ืื—ืจื™ ืžืฉืชืžืฉื™ื ืฉืฆืคื• ื‘ืคืจืกื•ืžื•ืช ื ื’ื“ ื”ืฉื™ืžื•ืฉ ื‘ืกืžื™ื ื‘ืžื˜ืจื” ืœื‘ื“ื•ืง ื”ืื ืžืฉืชืžืฉื™ื ืืœื• ื ื›ื ืกื• ืœืืชืจื™ื ื”ืชื•ืžื›ื™ื ื‘ืฉื™ืžื•ืฉ ื‘ืกืžื™ื. ื“ื ื™ืืœ ื‘ืจืื ื˜, ืคืขื™ืœ ื”ื“ื•ื’ืœ ื‘ืคืจื˜ื™ื•ืช ื”ืžืฉืชืžืฉื™ื ื‘ืื™ื ื˜ืจื ื˜, ื—ืฉืฃ ื›ื™ ื”-CIA ืฉืœื— ืขื•ื’ื™ื•ืช ืงื‘ื•ืขื•ืช ืœืžื—ืฉื‘ื™ ืื–ืจื—ื™ื ื‘ืžืฉืš ืขืฉืจ ืฉื ื™ื. ื‘-25 ื‘ื“ืฆืžื‘ืจ 2005 ื’ื™ืœื” ื‘ืจืื ื˜ ื›ื™ ื”ืกื•ื›ื ื•ืช ืœื‘ื™ื˜ื—ื•ืŸ ืœืื•ืžื™ (ื”-NSA) ื”ืฉืื™ืจื” ืฉืชื™ ืขื•ื’ื™ื•ืช ืงื‘ื•ืขื•ืช ื‘ืžื—ืฉื‘ื™ ืžื‘ืงืจื™ื ื‘ื’ืœืœ ืฉื“ืจื•ื’ ืชื•ื›ื ื”. ืœืื—ืจ ืฉื”ื ื•ืฉื ืคื•ืจืกื, ื”ื ื‘ื™ื˜ืœื• ืžื™ื“ ืืช ื”ืฉื™ืžื•ืฉ ื‘ื”ืŸ.'
question = 'ื›ื™ืฆื“ ื”ื•ื’ื‘ืœ ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ืœื”ืฉื™ื’ ื‘ืืžืฆืขื•ืช ื”ืขื•ื’ื™ื•ืช?'

oracle(question=question, context=context)
```

Output:
```json
{
    "score": 0.9999945163726807,
    "start": 101,
    "end": 114,
    "answer": "ื‘ืืžืฆืขื•ืช ื—ืงื™ืงื”"
}
```

## Citation

If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```

**BibTeX:**

```bibtex
@misc{shmidman2023dictabert,
      title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, 
      author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
      year={2023},
      eprint={2308.16687},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## License

Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg