dictabert-large-heq / README.md
Shaltiel's picture
Update README.md
7d03e94 verified
metadata
license: cc-by-4.0
language:
  - he

DictaBERT-Large: A State-of-the-Art BERT-Large Suite for Modern Hebrew

State-of-the-art language model for Hebrew, released here.

This is the fine-tuned BERT-large model for the question-answering task using the HeQ dataset.

For the bert-base models for other tasks, see here.

Sample usage:

from transformers import pipeline

oracle = pipeline('question-answering', model='dicta-il/dictabert-large-heq')


context = 'ื‘ื ื™ื™ืช ืคืจื•ืคื™ืœื™ื ืฉืœ ืžืฉืชืžืฉื™ื ื ื—ืฉื‘ืช ืขืœ ื™ื“ื™ ืจื‘ื™ื ื›ืื™ื•ื ืคื•ื˜ื ืฆื™ืืœื™ ืขืœ ื”ืคืจื˜ื™ื•ืช. ืžืกื™ื‘ื” ื–ื• ื”ื’ื‘ื™ืœื• ื—ืœืง ืžื”ืžื“ื™ื ื•ืช ื‘ืืžืฆืขื•ืช ื—ืงื™ืงื” ืืช ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ืœื”ืฉื™ื’ ื‘ืืžืฆืขื•ืช ืขื•ื’ื™ื•ืช ื•ืืช ืื•ืคืŸ ื”ืฉื™ืžื•ืฉ ื‘ืขื•ื’ื™ื•ืช. ืืจืฆื•ืช ื”ื‘ืจื™ืช, ืœืžืฉืœ, ืงื‘ืขื” ื—ื•ืงื™ื ื ื•ืงืฉื™ื ื‘ื›ืœ ื”ื ื•ื’ืข ืœื™ืฆื™ืจืช ืขื•ื’ื™ื•ืช ื—ื“ืฉื•ืช. ื—ื•ืงื™ื ืืœื•, ืืฉืจ ื ืงื‘ืขื• ื‘ืฉื ืช 2000, ื ืงื‘ืขื• ืœืื—ืจ ืฉื ื—ืฉืฃ ื›ื™ ื”ืžืฉืจื“ ืœื™ื™ืฉื•ื ื”ืžื“ื™ื ื™ื•ืช ืฉืœ ื”ืžืžืฉืœ ื”ืืžืจื™ืงืื™ ื ื’ื“ ื”ืฉื™ืžื•ืฉ ื‘ืกืžื™ื (ONDCP) ื‘ื‘ื™ืช ื”ืœื‘ืŸ ื”ืฉืชืžืฉ ื‘ืขื•ื’ื™ื•ืช ื›ื“ื™ ืœืขืงื•ื‘ ืื—ืจื™ ืžืฉืชืžืฉื™ื ืฉืฆืคื• ื‘ืคืจืกื•ืžื•ืช ื ื’ื“ ื”ืฉื™ืžื•ืฉ ื‘ืกืžื™ื ื‘ืžื˜ืจื” ืœื‘ื“ื•ืง ื”ืื ืžืฉืชืžืฉื™ื ืืœื• ื ื›ื ืกื• ืœืืชืจื™ื ื”ืชื•ืžื›ื™ื ื‘ืฉื™ืžื•ืฉ ื‘ืกืžื™ื. ื“ื ื™ืืœ ื‘ืจืื ื˜, ืคืขื™ืœ ื”ื“ื•ื’ืœ ื‘ืคืจื˜ื™ื•ืช ื”ืžืฉืชืžืฉื™ื ื‘ืื™ื ื˜ืจื ื˜, ื—ืฉืฃ ื›ื™ ื”-CIA ืฉืœื— ืขื•ื’ื™ื•ืช ืงื‘ื•ืขื•ืช ืœืžื—ืฉื‘ื™ ืื–ืจื—ื™ื ื‘ืžืฉืš ืขืฉืจ ืฉื ื™ื. ื‘-25 ื‘ื“ืฆืžื‘ืจ 2005 ื’ื™ืœื” ื‘ืจืื ื˜ ื›ื™ ื”ืกื•ื›ื ื•ืช ืœื‘ื™ื˜ื—ื•ืŸ ืœืื•ืžื™ (ื”-NSA) ื”ืฉืื™ืจื” ืฉืชื™ ืขื•ื’ื™ื•ืช ืงื‘ื•ืขื•ืช ื‘ืžื—ืฉื‘ื™ ืžื‘ืงืจื™ื ื‘ื’ืœืœ ืฉื“ืจื•ื’ ืชื•ื›ื ื”. ืœืื—ืจ ืฉื”ื ื•ืฉื ืคื•ืจืกื, ื”ื ื‘ื™ื˜ืœื• ืžื™ื“ ืืช ื”ืฉื™ืžื•ืฉ ื‘ื”ืŸ.'
question = 'ื›ื™ืฆื“ ื”ื•ื’ื‘ืœ ื”ืžื™ื“ืข ืฉื ื™ืชืŸ ืœื”ืฉื™ื’ ื‘ืืžืฆืขื•ืช ื”ืขื•ื’ื™ื•ืช?'

oracle(question=question, context=context)

Output:

{
    "score": 0.9999945163726807,
    "start": 101,
    "end": 114,
    "answer": "ื‘ืืžืฆืขื•ืช ื—ืงื™ืงื”"
}

Citation

If you use DictaBERT in your research, please cite DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

BibTeX:

@misc{shmidman2023dictabert,
      title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, 
      author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
      year={2023},
      eprint={2308.16687},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0