AlephBERT Hebrew Intent Classifier · ONNX

ONNX-optimized variant of spivi87/alephbert-intent-he for ~50 ms CPU inference. Identical weights and labels — just packaged as a runtime-portable graph.

For most users: prefer the PyTorch repo plus transformers.pipeline("text-classification", ...) — it's a 5-line snippet and HF handles everything. Use this repo when you need predictable CPU latency (production webhooks, edge devices, free-tier servers).

Usage

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
tok = Tokenizer.from_file("tokenizer.json")
enc = tok.encode("תוסיף חלב וביצים")

# IMPORTANT: use enc.attention_mask — the tokenizer pads to 128 by default,
# so a naive all-ones mask attends to PAD tokens and tanks accuracy to ~16%.
input_ids = np.array([enc.ids], dtype=np.int64)
attention_mask = np.array([enc.attention_mask], dtype=np.int64)

logits = session.run(
    None, {"input_ids": input_ids, "attention_mask": attention_mask}
)[0]
probs = np.exp(logits[0] - logits[0].max())
probs /= probs.sum()

# id2label is in config.json (same as the PyTorch repo)
import json
config = json.load(open("config.json"))
id2label = {int(k): v for k, v in config["id2label"].items()}
print(id2label[int(np.argmax(probs))])  # → GROCERY_REQUEST
print(f"confidence: {float(probs.max()):.3f}")

Batched inference

texts = ["תוסיף חלב", "מה ברשימה?", "סיימתי קניות"]
encs = tok.encode_batch(texts)

input_ids      = np.array([e.ids            for e in encs], dtype=np.int64)
attention_mask = np.array([e.attention_mask for e in encs], dtype=np.int64)

logits = session.run(
    None, {"input_ids": input_ids, "attention_mask": attention_mask}
)[0]
preds = np.argmax(logits, axis=-1)
print([id2label[int(p)] for p in preds])

Performance

On Apple M3 (CPU, ONNX Runtime 1.x): ~50 ms / single inference, scaling linearly with batch size. See spivi87/alephbert-intent-he for accuracy / F1 — the ONNX export is validated to match PyTorch logits within atol=1e-4 on the test sentences.

Attribution & License

Apache 2.0. Built on onlplab/alephbert-base (also Apache 2.0). See the GitHub repo for the full reproducible recipe.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spivi87/alephbert-intent-he-onnx

Quantized
(1)
this model