Introducing BEREL 2.0 - New and Improved BEREL: BERT Embeddings for Rabbinic-Encoded Language
When using BEREL 2.0, please reference:
Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel, "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language", Aug 2022 [arXiv:2208.01875]
- Usage:
from transformers import AutoTokenizer, BertForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL_2.0')
model = BertForMaskedLM.from_pretrained('dicta-il/BEREL_2.0')
# for evaluation, disable dropout
model.eval()
NOTE: This code will not work and provide bad results if you use
BertTokenizer
. Please useAutoTokenizer
orBertTokenizerFast
.
- Demo site: You can experiment with the model in a GUI interface here: https://dicta-bert-demo.netlify.app/?genre=rabbinic
- The main part of the GUI consists of word buttons visualizing the tokenization of the sentences. Clicking on a button masks it, and then three BEREL word predictions are shown. Clicking on that bubble expands it to 10 predictions; alternatively, ctrl-clicking on that initial bubble expands to 30 predictions.
- Ctrl-clicking adjacent word buttons combines them into a single token for the mask.
- The edit box on top contains the input sentence; this can be modified at will, and the word-buttons will adjust as relevant.
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.