dictabert-large-heq / README.md
Shaltiel's picture
Update README.md
7d03e94 verified
---
license: cc-by-4.0
language:
- he
---
# DictaBERT-Large: A State-of-the-Art BERT-Large Suite for Modern Hebrew
State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
This is the fine-tuned BERT-large model for the question-answering task using the [HeQ](https://u.cs.biu.ac.il/~yogo/heq.pdf) dataset.
For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
Sample usage:
```python
from transformers import pipeline
oracle = pipeline('question-answering', model='dicta-il/dictabert-large-heq')
context = 'ื‘ื ื™ื™ืช ืคืจื•ืคื™ืœื™ื ืฉืœ ืžืฉืชืžืฉื™ื ื ื—ืฉื‘ืช ืขืœ ื™ื“ื™ ืจื‘ื™ื ื›ืื™ื•ื ืคื•ื˜ื ืฆื™ืืœื™ ืขืœ ื”ืคืจื˜ื™ื•ืช. ืžืกื™ื‘ื” ื–ื• ื”ื’ื‘ื™ืœื• ื—ืœืง ืžื”ืžื“ื™ื ื•ืช ื‘ืืžืฆืขื•ืช ื—ืงื™ืงื” ืืช ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ืœื”ืฉื™ื’ ื‘ืืžืฆืขื•ืช ืขื•ื’ื™ื•ืช ื•ืืช ืื•ืคืŸ ื”ืฉื™ืžื•ืฉ ื‘ืขื•ื’ื™ื•ืช. ืืจืฆื•ืช ื”ื‘ืจื™ืช, ืœืžืฉืœ, ืงื‘ืขื” ื—ื•ืงื™ื ื ื•ืงืฉื™ื ื‘ื›ืœ ื”ื ื•ื’ืข ืœื™ืฆื™ืจืช ืขื•ื’ื™ื•ืช ื—ื“ืฉื•ืช. ื—ื•ืงื™ื ืืœื•, ืืฉืจ ื ืงื‘ืขื• ื‘ืฉื ืช 2000, ื ืงื‘ืขื• ืœืื—ืจ ืฉื ื—ืฉืฃ ื›ื™ ื”ืžืฉืจื“ ืœื™ื™ืฉื•ื ื”ืžื“ื™ื ื™ื•ืช ืฉืœ ื”ืžืžืฉืœ ื”ืืžืจื™ืงืื™ ื ื’ื“ ื”ืฉื™ืžื•ืฉ ื‘ืกืžื™ื (ONDCP) ื‘ื‘ื™ืช ื”ืœื‘ืŸ ื”ืฉืชืžืฉ ื‘ืขื•ื’ื™ื•ืช ื›ื“ื™ ืœืขืงื•ื‘ ืื—ืจื™ ืžืฉืชืžืฉื™ื ืฉืฆืคื• ื‘ืคืจืกื•ืžื•ืช ื ื’ื“ ื”ืฉื™ืžื•ืฉ ื‘ืกืžื™ื ื‘ืžื˜ืจื” ืœื‘ื“ื•ืง ื”ืื ืžืฉืชืžืฉื™ื ืืœื• ื ื›ื ืกื• ืœืืชืจื™ื ื”ืชื•ืžื›ื™ื ื‘ืฉื™ืžื•ืฉ ื‘ืกืžื™ื. ื“ื ื™ืืœ ื‘ืจืื ื˜, ืคืขื™ืœ ื”ื“ื•ื’ืœ ื‘ืคืจื˜ื™ื•ืช ื”ืžืฉืชืžืฉื™ื ื‘ืื™ื ื˜ืจื ื˜, ื—ืฉืฃ ื›ื™ ื”-CIA ืฉืœื— ืขื•ื’ื™ื•ืช ืงื‘ื•ืขื•ืช ืœืžื—ืฉื‘ื™ ืื–ืจื—ื™ื ื‘ืžืฉืš ืขืฉืจ ืฉื ื™ื. ื‘-25 ื‘ื“ืฆืžื‘ืจ 2005 ื’ื™ืœื” ื‘ืจืื ื˜ ื›ื™ ื”ืกื•ื›ื ื•ืช ืœื‘ื™ื˜ื—ื•ืŸ ืœืื•ืžื™ (ื”-NSA) ื”ืฉืื™ืจื” ืฉืชื™ ืขื•ื’ื™ื•ืช ืงื‘ื•ืขื•ืช ื‘ืžื—ืฉื‘ื™ ืžื‘ืงืจื™ื ื‘ื’ืœืœ ืฉื“ืจื•ื’ ืชื•ื›ื ื”. ืœืื—ืจ ืฉื”ื ื•ืฉื ืคื•ืจืกื, ื”ื ื‘ื™ื˜ืœื• ืžื™ื“ ืืช ื”ืฉื™ืžื•ืฉ ื‘ื”ืŸ.'
question = 'ื›ื™ืฆื“ ื”ื•ื’ื‘ืœ ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ืœื”ืฉื™ื’ ื‘ืืžืฆืขื•ืช ื”ืขื•ื’ื™ื•ืช?'
oracle(question=question, context=context)
```
Output:
```json
{
"score": 0.9999945163726807,
"start": 101,
"end": 114,
"answer": "ื‘ืืžืฆืขื•ืช ื—ืงื™ืงื”"
}
```
## Citation
If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
**BibTeX:**
```bibtex
@misc{shmidman2023dictabert,
title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
year={2023},
eprint={2308.16687},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## License
Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
This work is licensed under a
[Creative Commons Attribution 4.0 International License][cc-by].
[![CC BY 4.0][cc-by-image]][cc-by]
[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg