Hebrew manuscript Genre classifier

Multi-label sigmoid classifier predicting genre labels for Hebrew manuscripts. Used as a P136 fallback in the MHM (Mapping Hebrew Manuscripts) Pipeline's Wikidata Studio.

Labels (9)

Piyyutim, Poetry, Illustrated works (Manuscript), Personal correspondence, Censored manuscripts, Autograph manuscripts, Records (Documents), Bibliographies, __NOTA__

Recommended threshold

0.65 (sigmoid score; pre-sigmoid logits also work with logits > 0).

Performance

Best fold F1 on the desktop training corpus: 0.921.

Architecture

DictaBERT encoder + single Linear classifier head with problem_type="multi_label_classification". This is a STANDARD architecture — usable directly via HF's Inference Providers serverless tier:

from huggingface_hub import InferenceClient
client = InferenceClient(token="hf_xxx")
print(client.text_classification(
    text="...",
    model="alexgoldberg/hebrew-manuscript-genre-classifier",
))

Citation

Trained as part of the MHM Pipeline at Bar-Ilan University. See https://github.com/alexgoldberg/mhm-pipeline for the full training pipeline.

Downloads last month
28
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support