Hebrew Punctuation model

Introduction

This model is a fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions.

Install

git lfs install 
git clone https://huggingface.co/verbit/hebrew_punctuation
cd hebrew_punctuation
python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Usage

For now this is the recommended way to use this model:

from transformers import BertTokenizer

from src.models import BertForPunctuation
from src.inference import get_prediction

model = BertForPunctuation.from_pretrained("verbit/hebrew_punctuation")
tokenizer = BertTokenizer.from_pretrained("verbit/hebrew_punctuation")
model.eval()

text = """חברת ורביט פיתחה מערכת לתמלול המבוססת על בינה מלאכותית וגורם אנושי ושוקדת על תמלול עדויות ניצולי שואה 
את התוצאות אפשר לראות כבר ברשת בהן חלקים מעדותו של טוביה ביילסקי שהיה מפקד גדוד הפרטיזנים היהודים בביילורוסיה"""

punct_text = get_prediction(
    model=model,
    text=text,
    tokenizer=tokenizer,
    backward_context=model.config.backward_context,
    forward_context=model.config.forward_context,
)
print(punct_text)

Contact

For any questions or issues, please contact research.team@verbit.ai.

verbit
/

hebrew_punctuation

Hebrew Punctuation model

Introduction

Install

Usage

Contact

Model tree for verbit/hebrew_punctuation