Hebrew Punctuation model

Introduction

This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions.

Install

git lfs install 
git clone https://huggingface.co/verbit/hebrew_punctuation
cd hebrew_punctuation
python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Usage

For now this is the recommended way to use this model:

from transformers import BertTokenizer

from src.models import BertForPunctuation
from src.inference import get_prediction

model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation")
tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation")
model.eval()

text = """讞讘专转 讜专讘讬讟 驻讬转讞讛 诪注专讻转 诇转诪诇讜诇 讛诪讘讜住住转 注诇 讘讬谞讛 诪诇讗讻讜转讬转 讜讙讜专诐 讗谞讜砖讬 讜砖讜拽讚转 注诇 转诪诇讜诇 注讚讜讬讜转 谞讬爪讜诇讬 砖讜讗讛 
讗转 讛转讜爪讗讜转 讗驻砖专 诇专讗讜转 讻讘专 讘专砖转 讘讛谉 讞诇拽讬诐 诪注讚讜转讜 砖诇 讟讜讘讬讛 讘讬讬诇住拽讬 砖讛讬讛 诪驻拽讚 讙讚讜讚 讛驻专讟讬讝谞讬诐 讛讬讛讜讚讬诐 讘讘讬讬诇讜专讜住讬讛"""

punct_text = get_prediction(
    model=model,
    text=text,
    tokenizer=tokenizer,
    backward_context=model.config.backward_context,
    forward_context=model.config.forward_context,
)
print(punct_text)

Contact

For any questions or issues, please contact research.team@verbit.ai.

Downloads last month
6
Inference API
Unable to determine this model鈥檚 pipeline type. Check the docs .

Model tree for verbit/hebrew_punctuation

Finetuned
(4)
this model