File size: 943 Bytes
f3ed02f
 
 
 
2df4717
 
80e8f2c
 
 
097064f
80e8f2c
 
 
 
 
 
 
 
 
 
b268c72
 
8362824
 
b268c72
 
8362824
532ac30
 
 
 
 
d4cb0e9
532ac30
 
 
 
 
7abbe25
532ac30
80e8f2c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
language:
- he
pipeline_tag: fill-mask
datasets:
- HeNLP/HeDC4
---


## Hebrew Language Model for Long Documents

State-of-the-art Longformer language model for Hebrew.

#### How to use

```python
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('HeNLP/LongHeRo')
model = AutoModelForMaskedLM.from_pretrained('HeNLP/LongHeRo')

# Tokenization Example:
# Tokenizing
tokenized_string = tokenizer('שלום לכולם')

# Decoding 
decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True)
```


### Citing

If you use LongHeRo in your research, please cite [HeRo: RoBERTa and Longformer Hebrew Language Models](http://arxiv.org/abs/2304.11077).
```
@article{shalumov2023hero,
      title={HeRo: RoBERTa and Longformer Hebrew Language Models}, 
      author={Vitaly Shalumov and Harel Haskey},
      year={2023},
      journal={arXiv:2304.11077},
}
```