HalleluBERT_large_sentiment_analysis

This model is a fine-tuned version of HalleluBERT/HalleluBERT_large for Hebrew sentiment analysis.

The model was trained on the NNLP-IL/HebrewSentiment dataset.

Final evaluation results:

  • Loss: 0.3670
  • Accuracy: 0.8924
  • Macro F1: 0.8918
  • Weighted F1: 0.8922

🚀 Use this model

Quickstart with pipeline

from transformers import pipeline

classifier = pipeline(
    task="text-classification",
    model="haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    tokenizer="haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    return_all_scores=True,
)

text = "השירות היה מצוין והאוכל היה טעים מאוד!"
print(classifier(text))
# [[{'label': 'positive', 'score': 0.98}, {'label': 'neutral', 'score': 0.01}, {'label': 'negative', 'score': 0.01}]]

Direct loading with AutoModel

For batching, custom thresholds, or export workflows:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "haimgoldfisher/HalleluBERT_large_sentiment_analysis"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

texts = [
    "השירות היה מצוין והאוכל היה טעים מאוד!",
    "החוויה הייתה מאכזבת והמחיר היה גבוה מדי.",
    "ההזמנה הגיעה בזמן.",
]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

probs = torch.softmax(logits, dim=-1)
preds = probs.argmax(dim=-1)
labels = [model.config.id2label[p.item()] for p in preds]

for text, label, prob in zip(texts, labels, probs):
    print(f"{label}\t({prob.max():.3f})\t{text}")

GPU / half-precision

HalleluBERT-Large is ~355M params — use fp16 on GPU for ~2× throughput:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSequenceClassification.from_pretrained(
    "haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
).to(device)

🌐 Deploy

Inference Providers status: This model isn't currently deployed by any Hugging Face Inference Provider (Novita, Together, Hyperbolic, etc.), so the serverless widget on the model page may show "This model isn't deployed by any Inference Provider." You can request provider support here — react with 👍 to the discussion to upvote.

In the meantime, all four options below work today:

Option 1 — Inference Endpoints (recommended, HF-hosted)

Dedicated HF infrastructure — works for any model on the Hub, no provider listing required. Click Deploy → Inference Endpoints on the model page, or use the CLI:

huggingface-cli login

Recommended starting config for a Large-size model:

  • Hardware: GPU T4 (cost-efficient) or A10G (low-latency)
  • CPU fallback: Intel Sapphire Rapids — only for < 10 req/min
  • Replicas: 1 (autoscale 1→3)
  • Task: text-classification
  • Max input length: 128 tokens

Call it once running:

curl https://<your-endpoint>.endpoints.huggingface.cloud \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"inputs": "השירות היה מצוין והאוכל היה טעים מאוד!"}'

Option 2 — Docker (self-hosted with TEI)

text-embeddings-inference supports BERT/RoBERTa sequence classifiers and gives the lowest self-hosted latency:

docker run -p 8080:80 \
  -v $PWD/data:/data \
  --gpus all \
  ghcr.io/huggingface/text-embeddings-inference:1.5 \
  --model-id haimgoldfisher/HalleluBERT_large_sentiment_analysis

Call it:

curl http://localhost:8080/predict \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "השירות היה מצוין והאוכל היה טעים מאוד!"}'

Option 3 — Minimal FastAPI server

For full control or to add custom pre/post-processing:

# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()
clf = pipeline(
    "text-classification",
    model="haimgoldfisher/HalleluBERT_large_sentiment_analysis",
    return_all_scores=True,
    device=0,  # set to -1 for CPU
)

class Payload(BaseModel):
    inputs: str | list[str]

@app.post("/predict")
def predict(p: Payload):
    return clf(p.inputs)
pip install fastapi uvicorn transformers torch
uvicorn server:app --host 0.0.0.0 --port 8080

Option 4 — ONNX / quantized for edge & CPU

A Large model is heavy on CPU — ONNX + INT8 quantization typically cuts latency by 3–4×:

from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer
from transformers import AutoTokenizer

model_id = "haimgoldfisher/HalleluBERT_large_sentiment_analysis"

# Export to ONNX
model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.save_pretrained("./onnx-halleluBERT-sentiment")
tokenizer.save_pretrained("./onnx-halleluBERT-sentiment")

# INT8 dynamic quantization
quantizer = ORTQuantizer.from_pretrained("./onnx-halleluBERT-sentiment")
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
quantizer.quantize(save_dir="./onnx-halleluBERT-sentiment-int8", quantization_config=qconfig)

Model description

This model performs sentiment classification for Hebrew text.

It is based on HalleluBERT Large, a RoBERTa-style transformer model pretrained specifically for Hebrew.

The model was fine-tuned for a 3-class sentiment classification task:

  • Positive
  • Negative
  • Neutral

A classification head was added on top of the [CLS] token representation and the entire model was fine-tuned end-to-end.


Intended uses & limitations

Intended uses

This model is suitable for:

  • Sentiment analysis of Hebrew text
  • Social media monitoring
  • Customer feedback analysis
  • Review classification
  • General Hebrew NLP research

Limitations

  • The model was trained on a specific sentiment dataset and may not generalize perfectly to all domains.
  • Performance may degrade on:
    • highly informal slang
    • mixed Hebrew/English text
    • very long documents
  • The model assumes single-sentence or short paragraph inputs.

Training and evaluation data

Training was performed using the HebrewSentiment dataset: https://github.com/NNLP-IL/HebrewSentiment

The dataset contains labeled Hebrew sentences with sentiment annotations.

Dataset characteristics:

  • Language: Hebrew
  • Task: sentiment classification
  • Labels:
    • Positive
    • Negative
    • Neutral

The dataset was split into:

  • Training set
  • Validation set

Evaluation metrics:

  • Accuracy
  • Macro F1
  • Weighted F1

Macro F1 was used as the primary metric for model selection, since it better reflects performance across imbalanced classes.


Framework versions

  • Transformers 5.7.0
  • PyTorch 2.11.0+cu130
  • Datasets 4.8.5
  • Tokenizers 0.22.2
Downloads last month
187
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for haimgoldfisher/HalleluBERT_large_sentiment_analysis

Finetuned
(1)
this model

Evaluation results