Instructions to use joelbarmettler/gheim-ch-560m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use joelbarmettler/gheim-ch-560m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="joelbarmettler/gheim-ch-560m")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("joelbarmettler/gheim-ch-560m") model = AutoModelForTokenClassification.from_pretrained("joelbarmettler/gheim-ch-560m") - Notebooks
- Google Colab
- Kaggle
gheim-ch-560m
A multilingual token-classification model for personally-identifiable information
(PII) detection across the four official Swiss languages (de_CH, fr_CH, it_CH, rm)
and English. The model is a fine-tune of
FacebookAI/xlm-roberta-large
on joelbarmettler/gheim-ch-pii-171k.
Output schema is a 33-class BIOES tag set (8 PII categories plus the outside class)
aligned with the categorical naming used by openai/privacy-filter.
| Parameters | 560M |
| Languages | de_CH, fr_CH, it_CH, rm, en |
| Categories | account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret |
| Tag scheme | BIOES (33 classes) |
| Max sequence length | 512 |
| License | Apache 2.0 |
Full report: the curation pipeline, training procedure, comparison against seven other PII / NER systems, cross-domain results on four external benchmarks, methodology validation, and an extended related-work section are documented in
paper/paper.pdf(arXiv preprint forthcoming). This card is the deployment-facing summary.
Intended use
The model classifies character-level spans of PII so that text can be redacted prior to transmission to systems where personal data should not appear (for example, third-party LLM APIs hosted outside the data subject's jurisdiction). Output spans are intended for substitution or masking, not for entity linking or re-identification. The training data follows a recall-oriented labelling policy under which publicly-listed institutional information (e.g. court switchboard numbers, parliament email addresses, public-official names) is flagged as PII. Applications requiring stricter precision should pair model output with downstream filtering.
Usage
Recommended: gheim SDKs (round-trip with sentinel restoration)
For the typical use case — anonymise text, send to an LLM, restore the
originals on the way back — install the gheim
Python or gheim npm package.
This model is the default detector in both, and the wrappers handle
sentinel allocation, streaming-aware decode, multi-turn coherent
sessions, and a drop-in OpenAI client.
pip install "gheim[local,openai]" # Python
npm install gheim openai @huggingface/transformers # JS / TS
# Python — drop-in OpenAI client. Defaults to gheim-ch-560m.
from gheim.openai import OpenAI
client = OpenAI() # accepts the same kwargs as openai.OpenAI
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user",
"content": "Hi, my name is Joel. My phone is +41 44 268 12 34."}],
)
# r.choices[0].message.content has the original PII restored.
# OpenAI only ever saw "<PERSON_1>" and "<PHONE_1>".
// JS / TS — same idea.
import { OpenAI } from "gheim/openai";
const client = new OpenAI(); // accepts the same opts as openai's OpenAI
const r = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user",
content: "Hi, my name is Joel. My phone is +41 44 268 12 34." }],
});
Streaming, async, tool calls, and 9 other text-carrying endpoints
(responses, embeddings, moderations, audio.*, images.*) are
wrapped automatically. Full surface in the package READMEs:
Python
·
JS.
Alternative: raw transformers / transformers.js
If you only need a token classifier (no sentinel round-trip), use the HuggingFace pipelines directly.
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
repo = "joelbarmettler/gheim-ch-560m"
tok = AutoTokenizer.from_pretrained(repo)
mdl = AutoModelForTokenClassification.from_pretrained(repo)
ner = pipeline("token-classification", model=mdl, tokenizer=tok,
aggregation_strategy="simple")
text = ("Bitte überweisen Sie an Müller AG, IBAN CH9300762011623852957, "
"Werdstrasse 36, 8004 Zürich.")
for span in ner(text):
print(f"{span['entity_group']:<18} {span['score']:.2f} {span['word']!r}")
// Node / Bun / browser via @huggingface/transformers (transformers.js).
// The Hub repo only ships the int8 ONNX, so pass dtype: "q8".
import { pipeline } from "@huggingface/transformers";
const ner = await pipeline("token-classification",
"joelbarmettler/gheim-ch-560m",
{ aggregation_strategy: "simple", dtype: "q8" });
const out = await ner("Email me at alice@example.ch, phone +41 44 268 12 34.");
Performance
Strict-span F1 (seqeval) on the held-out test split of
joelbarmettler/gheim-ch-pii-171k
(15,861 chunks, document-isolated from the training split for the real-text
portion). The test set was scored once.
| Metric | Test | Validation |
|---|---|---|
| F1 | 0.916 | 0.918 |
| Precision | 0.907 | 0.908 |
| Recall | 0.926 | 0.926 |
Per-language × per-category char-level F1 on the same test split. Body cells are char-level F1 for each (language, category) pair; the right-most column gives the per-category average over languages (gold-weighted), and the bottom row gives the per-language average over categories. The bottom-right cell is the overall char F1.
| Category | de_ch | fr_ch | it_ch | rm | en | Avg. |
|---|---|---|---|---|---|---|
account_number |
0.932 | 0.717 | 0.660 | 0.169 | 0.994 | 0.765 |
private_address |
0.890 | 0.870 | 0.915 | 0.825 | 0.973 | 0.889 |
private_date |
0.943 | 0.934 | 0.952 | 0.888 | 0.909 | 0.939 |
private_email |
0.988 | 0.994 | 0.999 | 0.992 | 0.999 | 0.994 |
private_person |
0.938 | 0.948 | 0.962 | 0.897 | 0.951 | 0.944 |
private_phone |
0.989 | 0.985 | 0.993 | 0.995 | 0.997 | 0.990 |
private_url |
0.992 | 0.994 | 0.994 | 0.958 | 0.980 | 0.992 |
secret |
0.999 | 1.000 | 0.999 | 1.000 | 1.000 | 1.000 |
| Avg. | 0.954 | 0.953 | 0.972 | 0.923 | 0.981 | 0.958 |
For the full comparison against seven other open PII / NER systems on the
same Swiss test set, cross-domain evaluation on four external benchmarks
(AI4Privacy openpii-1m, ZurichNLP swissner, CoNLL-2003, Babelscape
WikiNeural), and the methodology-validation reproductions of each
baseline's published numbers, see
paper/paper.pdf
§4 and the machine-readable matrix at
eval/positioning_matrix.json.
Deployment formats
The model is published in two formats:
model.safetensors(root): fp32 PyTorch checkpoint, 2.2 GB, intended for server-side inference viatransformers.onnx/model_quantized.onnx: int8 dynamic-quantised ONNX export, 552 MB, intended for in-browser inference via@huggingface/transformerson WebGPU or WebAssembly. Selected withdtype: "q8".
| Format | Size | Test F1 | Δ vs fp32 |
|---|---|---|---|
| PyTorch fp32 | 2.2 GB | 0.916 | (baseline) |
| ONNX int8 (dynamic) | 552 MB | 0.909 | -0.7 pp |
The int8 export loses 0.7 pp overall F1 relative to the PyTorch checkpoint;
the loss is concentrated in private_address (-2.1 pp) and account_number
(-5.3 pp), the two categories that were already weakest under fp32. All
five languages are affected uniformly within ±0.001 pp of the overall
drop. Per-category and per-language quantisation deltas are tabulated in
paper/paper.pdf
§3.3.
Training procedure
Selected from a controlled bake-off against ZurichNLP/swissbert (270M
dense), each model receiving an identical 5 × 3 sweep over (learning
rate, layer-wise LR decay) at 1 epoch. The winning configuration per
base model was trained for 3 full epochs and selected by best validation
F1. xlm-roberta-large won the bake-off (val F1 0.918 vs swissbert's
0.910). Selected configuration: AdamW, LR 5e-5 cosine with 5% warmup,
no LLRD, effective batch 128 (per-device 64 × 2 GPUs DDP), bf16, 3
epochs, max sequence length 512. Best checkpoint at epoch 2.50 by
validation overall_f1. Wall time ≈ 52 min train + 4 min eval on
2 × RTX 4090. Full procedure including the hyperparameter sweep results
is in
paper/paper.pdf
§3.
The training data was the train split of gheim-ch-pii-171k (139,641
chunks) plus an English-anchor and Swiss-region email rescue slice from
ai4privacy/pii-masking-openpii-1m (≈ 14,000 chunks); validation and
test contain no AI4Privacy data.
Limitations
- Recall-oriented labelling policy. The model inherits the dataset's policy of flagging publicly-listed institutional contact information. Applications needing stricter precision should apply downstream filtering or a private-vs-public-entity post-classifier.
private_addresstest F1 is 0.78. Boundary placement on multi-token addresses is the dominant error mode.account_numbertest F1 is 0.67. For production use, pair the model with the regex front-end documented in thegheimlibrary, which applies checksum validation (IBAN, AHV, VAT-CHE, Luhn).- Romansh test F1 is 0.85, the weakest of the five languages. The RM training material is dominated by a single literary/journalistic register; performance on dialectal or technical RM text is unmeasured.
- Swiss German dialect (GSW) is not measured. The fasttext detector used in data preparation labels GSW as standard German.
- Re-identification is not in scope. The model is intended for redaction; it does not return entity-linked identifiers.
License
Apache 2.0, inherited from the base model FacebookAI/xlm-roberta-large.
The training data
(joelbarmettler/gheim-ch-pii-171k)
is released under CC BY 4.0; attribution to its upstream corpora (the
swiss-ai/apertus-pretrain-* datasets) is required when reusing the data.
Citation
@misc{barmettler2026gheim_ch_560m,
title = {gheim-ch-560m: A multilingual PII detection model for the Swiss market},
author = {Joel Barmettler},
year = {2026},
url = {https://huggingface.co/joelbarmettler/gheim-ch-560m}
}
If the model is used in published work, please also cite the dataset:
@misc{barmettler2026gheim_ch_pii,
title = {gheim-ch-pii-171k: A Swiss-grounded PII NER dataset with synthetic gap-fill},
author = {Joel Barmettler},
year = {2026},
url = {https://huggingface.co/datasets/joelbarmettler/gheim-ch-pii-171k}
}
Maintainer
Joel Barmettler · jbarmettler@proton.me · joelbarmettler.xyz · github.com/joelbarmettlerUZH/gheim
Source code, issue tracker, and the wider gheim ecosystem (Python and Node libraries, redaction server, composite detector) are at github.com/joelbarmettlerUZH/gheim.
- Downloads last month
- 227
Model tree for joelbarmettler/gheim-ch-560m
Base model
FacebookAI/xlm-roberta-largeDataset used to train joelbarmettler/gheim-ch-560m
Evaluation results
- f1 on gheim-ch-pii-171k (test split)self-reported0.916
- precision on gheim-ch-pii-171k (test split)self-reported0.907
- recall on gheim-ch-pii-171k (test split)self-reported0.926