HalleluBERT_large_sentiment_analysis

This model is a fine-tuned version of HalleluBERT/HalleluBERT_large for Hebrew sentiment analysis.

The model was trained on the NNLP-IL/HebrewSentiment dataset.

Final evaluation results:

Loss: 0.3670
Accuracy: 0.8924
Macro F1: 0.8918
Weighted F1: 0.8922

🚀 Use this model

Quickstart with `pipeline`

from transformers import pipeline

classifier = pipeline(
    task="text-classification",
    model="haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    tokenizer="haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    return_all_scores=True,
)

text = "השירות היה מצוין והאוכל היה טעים מאוד!"
print(classifier(text))
# [[{'label': 'positive', 'score': 0.98}, {'label': 'neutral', 'score': 0.01}, {'label': 'negative', 'score': 0.01}]]

Direct loading with `AutoModel`

For batching, custom thresholds, or export workflows:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "haimgoldfisher/HalleluBERT_large_sentiment_analysis"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

texts = [
    "השירות היה מצוין והאוכל היה טעים מאוד!",
    "החוויה הייתה מאכזבת והמחיר היה גבוה מדי.",
    "ההזמנה הגיעה בזמן.",
]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

probs = torch.softmax(logits, dim=-1)
preds = probs.argmax(dim=-1)
labels = [model.config.id2label[p.item()] for p in preds]

for text, label, prob in zip(texts, labels, probs):
    print(f"{label}\t({prob.max():.3f})\t{text}")

GPU / half-precision

HalleluBERT-Large is ~355M params — use fp16 on GPU for ~2× throughput:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSequenceClassification.from_pretrained(
    "haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
).to(device)

🌐 Deploy

Inference Providers status: This model isn't currently deployed by any Hugging Face Inference Provider (Novita, Together, Hyperbolic, etc.), so the serverless widget on the model page may show "This model isn't deployed by any Inference Provider." You can request provider support here — react with 👍 to the discussion to upvote.

In the meantime, all four options below work today:

Option 1 — Inference Endpoints (recommended, HF-hosted)

Dedicated HF infrastructure — works for any model on the Hub, no provider listing required. Click Deploy → Inference Endpoints on the model page, or use the CLI:

huggingface-cli login

Recommended starting config for a Large-size model:

Hardware: GPU T4 (cost-efficient) or A10G (low-latency)
CPU fallback: Intel Sapphire Rapids — only for < 10 req/min
Replicas: 1 (autoscale 1→3)
Task: text-classification
Max input length: 128 tokens

Call it once running:

curl https://<your-endpoint>.endpoints.huggingface.cloud \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"inputs": "השירות היה מצוין והאוכל היה טעים מאוד!"}'

Option 2 — Docker (self-hosted with TEI)

text-embeddings-inference supports BERT/RoBERTa sequence classifiers and gives the lowest self-hosted latency:

docker run -p 8080:80 \
  -v $PWD/data:/data \
  --gpus all \
  ghcr.io/huggingface/text-embeddings-inference:1.5 \
  --model-id haimgoldfisher/HalleluBERT_large_sentiment_analysis

Call it:

curl http://localhost:8080/predict \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "השירות היה מצוין והאוכל היה טעים מאוד!"}'

Option 3 — Minimal FastAPI server

For full control or to add custom pre/post-processing:

# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()
clf = pipeline(
    "text-classification",
    model="haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    return_all_scores=True,
    device=0,  # set to -1 for CPU
)

class Payload(BaseModel):
    inputs: str | list[str]

@app.post("/predict")
def predict(p: Payload):
    return clf(p.inputs)

pip install fastapi uvicorn transformers torch
uvicorn server:app --host 0.0.0.0 --port 8080

Option 4 — ONNX / quantized for edge & CPU

A Large model is heavy on CPU — ONNX + INT8 quantization typically cuts latency by 3–4×:

from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
from transformers import AutoTokenizer

model_id = "haimgoldfisher/HalleluBERT_large_sentiment_analysis"

# Export to ONNX
model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.save_pretrained("./onnx-halleluBERT-sentiment")
tokenizer.save_pretrained("./onnx-halleluBERT-sentiment")

# INT8 dynamic quantization
quantizer = ORTQuantizer.from_pretrained("./onnx-halleluBERT-sentiment")
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
quantizer.quantize(save_dir="./onnx-halleluBERT-sentiment-int8", quantization_config=qconfig)

Model description

This model performs sentiment classification for Hebrew text.

It is based on HalleluBERT Large, a RoBERTa-style transformer model pretrained specifically for Hebrew.

The model was fine-tuned for a 3-class sentiment classification task:

Positive
Negative
Neutral

A classification head was added on top of the [CLS] token representation and the entire model was fine-tuned end-to-end.

Intended uses & limitations

Intended uses

This model is suitable for:

Sentiment analysis of Hebrew text
Social media monitoring
Customer feedback analysis
Review classification
General Hebrew NLP research

Limitations

The model was trained on a specific sentiment dataset and may not generalize perfectly to all domains.
Performance may degrade on:
- highly informal slang
- mixed Hebrew/English text
- very long documents
The model assumes single-sentence or short paragraph inputs.

Training and evaluation data

Training was performed using the HebrewSentiment dataset: https://github.com/NNLP-IL/HebrewSentiment

The dataset contains labeled Hebrew sentences with sentiment annotations.

Dataset characteristics:

Language: Hebrew
Task: sentiment classification
Labels:
- Positive
- Negative
- Neutral

The dataset was split into:

Training set
Validation set

Evaluation metrics:

Accuracy
Macro F1
Weighted F1

Macro F1 was used as the primary metric for model selection, since it better reflects performance across imbalanced classes.

Framework versions

Transformers 5.7.0
PyTorch 2.11.0+cu130
Datasets 4.8.5
Tokenizers 0.22.2

Downloads last month: 187

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for haimgoldfisher/HalleluBERT_large_sentiment_analysis

Base model

HalleluBERT/HalleluBERT_large

Finetuned

(1)

this model

Evaluation results

Accuracy on HebrewSentiment (NNLP-IL)
self-reported

0.892
Macro F1 on HebrewSentiment (NNLP-IL)
self-reported

0.892
Weighted F1 on HebrewSentiment (NNLP-IL)
self-reported

0.892