ONNX
Hebrew
bert

DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

State-of-the-art language model for Hebrew, released here.

This is the fine-tuned BERT-base model for the named-entity-recognition task.

For the bert-base models for other tasks, see here.

Sample usage:

from transformers import pipeline
oracle = pipeline('ner', model='dicta-il/dictabert-ner', aggregation_strategy='simple')
# if we set aggregation_strategy to simple, we need to define a decoder for the tokenizer. Note that the last wordpiece of a group will still be emitted
from tokenizers.decoders import WordPiece
oracle.tokenizer.backend_tokenizer.decoder = WordPiece()
sentence = '''讚讜讚 讘谉-讙讜专讬讜谉 (16 讘讗讜拽讟讜讘专 1886 - 讜' 讘讻住诇讜 转砖诇"讚) 讛讬讛 诪讚讬谞讗讬 讬砖专讗诇讬 讜专讗砖 讛诪诪砖诇讛 讛专讗砖讜谉 砖诇 诪讚讬谞转 讬砖专讗诇.'''
oracle(sentence)

Output:

[
  {
    "entity_group": "PER",
    "score": 0.9999443,
    "word": "讚讜讚 讘谉 - 讙讜专讬讜谉",
    "start": 0,
    "end": 13
  },
  {
    "entity_group": "TIMEX",
    "score": 0.99987966,
    "word": "16 讘讗讜拽讟讜讘专 1886",
    "start": 15,
    "end": 31
  },
  {
    "entity_group": "TIMEX",
    "score": 0.9998579,
    "word": "讜' 讘讻住诇讜 转砖诇\"讚",
    "start": 34,
    "end": 48
  },
  {
    "entity_group": "TTL",
    "score": 0.99963045,
    "word": "讜专讗砖 讛诪诪砖诇讛",
    "start": 68,
    "end": 79
  },
  {
    "entity_group": "GPE",
    "score": 0.9997943,
    "word": "讬砖专讗诇",
    "start": 96,
    "end": 101
  }
]

Citation

If you use DictaBERT in your research, please cite DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

BibTeX:

@misc{shmidman2023dictabert,
      title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, 
      author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
      year={2023},
      eprint={2308.16687},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Downloads last month
16
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.