heBERT_NER / README.md
avichr's picture
Create README.md
fc4a877

HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition

HeBERT is a Hebrew pretrained language model. It is based on Google's BERT architecture and it is BERT-Base config.

HeBert was trained on three dataset:

  1. A Hebrew version of OSCAR: ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
  2. A Hebrew dump of Wikipedia: ~650 MB of data, including over 63 millions words and 3.8 millions sentences
  3. Emotion User Generated Content (UGC) data that was collected for the purpose of this study (described below).

Named-entity recognition (NER)

The ability of the model to classify named entities in text, such as persons' names, organizations, and locations; tested on a labeled dataset from Ben Mordecai and M Elhadad (2005), and evaluated with F1-score.

How to use

    from transformers import pipeline
    
    # how to use?
    NER = pipeline(
        "token-classification",
        model="avichr/heBERT_NER",
        tokenizer="avichr/heBERT_NER",
    )
    NER('讚讜讬讚 诇讜诪讚 讘讗讜谞讬讘专住讬讟讛 讛注讘专讬转 砖讘讬专讜砖诇讬诐')

Other tasks

Emotion Recognition Model. An online model can be found at huggingface spaces or as colab notebook
Sentiment Analysis.
masked-LM model (can be fine-tunned to any down-stream task).

Contact us

Avichay Chriqui
Inbal yahav
The Coller Semitic Languages AI Lab
Thank you, 转讜讚讛, 卮賰乇丕

If you used this model please cite us as :

Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={arXiv preprint arXiv:2102.01909},
  year={2021}
}

git