Multilingual Fine-Tuned Privacy Filter

This model is a fine-tuned version of openai/privacy-filter for multilingual PII token classification.

Public model URL:

https://huggingface.co/emiemimi/privacy-filter-multilingual-500

Training Data

  • Dataset: ai4privacy/pii-masking-openpii-1m
  • Languages: English, Polish, Swedish, German, French, Spanish
  • Training setting: 500 examples per language
  • Base model: openai/privacy-filter

OpenPII labels were mapped to the output label set used by openai/privacy-filter, including person, email, phone, date, address, account number, and secret categories.

Evaluation

The retained final evaluation uses the shared 50 examples per language, 300 rows total.

Evaluation Language Texts Precision Recall F1
simple de 50 0.925 0.933 0.929
improved de 50 0.914 0.924 0.919
simple en 50 0.958 0.964 0.961
improved en 50 0.940 0.946 0.943
simple es 50 0.928 0.967 0.947
improved es 50 0.902 0.940 0.921
simple fr 50 0.969 0.946 0.957
improved fr 50 0.944 0.918 0.931
simple pl 50 0.888 0.925 0.906
improved pl 50 0.859 0.892 0.875
simple sv 50 0.900 0.928 0.914
improved sv 50 0.867 0.893 0.880
simple overall 300 0.926 0.942 0.934
improved overall 300 0.901 0.917 0.909

simple counts a prediction as correct when it overlaps a gold PII span. improved also requires the mapped PII category to match.

Usage

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_id = "emiemimi/privacy-filter-multilingual-500"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

Project Code

The group GitHub repository should link directly to this model page and include the fine-tuning and shared-50 evaluation scripts.

Downloads last month
14
Safetensors
Model size
1B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for emiemimi/privacy-filter-multilingual-500

Finetuned
(39)
this model

Dataset used to train emiemimi/privacy-filter-multilingual-500