Hebrew manuscript Genre classifier
Multi-label sigmoid classifier predicting genre labels for Hebrew manuscripts. Used as a P136 fallback in the MHM (Mapping Hebrew Manuscripts) Pipeline's Wikidata Studio.
Labels (9)
Piyyutim, Poetry, Illustrated works (Manuscript), Personal correspondence, Censored manuscripts, Autograph manuscripts, Records (Documents), Bibliographies, __NOTA__
Recommended threshold
0.65 (sigmoid score; pre-sigmoid logits also work with
logits > 0).
Performance
Best fold F1 on the desktop training corpus: 0.921.
Architecture
DictaBERT encoder + single Linear classifier head with
problem_type="multi_label_classification". This is a STANDARD
architecture — usable directly via HF's Inference Providers
serverless tier:
from huggingface_hub import InferenceClient
client = InferenceClient(token="hf_xxx")
print(client.text_classification(
text="...",
model="alexgoldberg/hebrew-manuscript-genre-classifier",
))
Citation
Trained as part of the MHM Pipeline at Bar-Ilan University. See https://github.com/alexgoldberg/mhm-pipeline for the full training pipeline.
- Downloads last month
- 28