Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,32 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- he
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- bert
|
8 |
---
|
9 |
+
|
10 |
+
# Introducing BEREL 2.0 - New and Improved BEREL: BERT Embeddings for Rabbinic-Encoded Language
|
11 |
+
|
12 |
+
When using BEREL 2.0, please reference:
|
13 |
+
|
14 |
+
Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel, "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language", Aug 2022 [arXiv:2208.01875]
|
15 |
+
|
16 |
+
|
17 |
+
1. Usage:
|
18 |
+
|
19 |
+
```python
|
20 |
+
from transformers import AutoTokenizer, BertForMaskedLM
|
21 |
+
|
22 |
+
tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL_2.0')
|
23 |
+
model = BertForMaskedLM.from_pretrained('dicta-il/BEREL_2.0')
|
24 |
+
```
|
25 |
+
|
26 |
+
> NOTE: This code will **not** work and provide bad results if you use `BertTokenizer`. Please use `AutoTokenizer` or `BertTokenizerFast`.
|
27 |
+
|
28 |
+
2. Demo site:
|
29 |
+
You can experiment with the model in a GUI interface here: https://dicta-bert-demo.netlify.app/?genre=rabbinic
|
30 |
+
- The main part of the GUI consists of word buttons visualizing the tokenization of the sentences. Clicking on a button masks it, and then three BEREL word predictions are shown. Clicking on that bubble expands it to 10 predictions; alternatively, ctrl-clicking on that initial bubble expands to 30 predictions.
|
31 |
+
- Ctrl-clicking adjacent word buttons combines them into a single token for the mask.
|
32 |
+
- The edit box on top contains the input sentence; this can be modified at will, and the word-buttons will adjust as relevant.
|