--- language: - he pipeline_tag: fill-mask datasets: - HeNLP/HeDC4 --- ## Hebrew Language Model for Long Documents State-of-the-art Longformer language model for Hebrew. #### How to use ```python from transformers import AutoModelForMaskedLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('HeNLP/LongHeRo') model = AutoModelForMaskedLM.from_pretrained('HeNLP/LongHeRo') # Tokenization Example: # Tokenizing tokenized_string = tokenizer('שלום לכולם') # Decoding decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True) ``` ### Citing If you use LongHeRo in your research, please cite [HeRo: RoBERTa and Longformer Hebrew Language Models](http://arxiv.org/abs/2304.11077). ``` @article{shalumov2023hero, title={HeRo: RoBERTa and Longformer Hebrew Language Models}, author={Vitaly Shalumov and Harel Haskey}, year={2023}, journal={arXiv:2304.11077}, } ```