Instructions to use emiemimi/privacy-filter-multilingual-500 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use emiemimi/privacy-filter-multilingual-500 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="emiemimi/privacy-filter-multilingual-500")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("emiemimi/privacy-filter-multilingual-500") model = AutoModelForTokenClassification.from_pretrained("emiemimi/privacy-filter-multilingual-500") - Notebooks
- Google Colab
- Kaggle
Multilingual Fine-Tuned Privacy Filter
This model is a fine-tuned version of openai/privacy-filter for multilingual PII token classification.
Public model URL:
https://huggingface.co/emiemimi/privacy-filter-multilingual-500
Training Data
- Dataset:
ai4privacy/pii-masking-openpii-1m - Languages: English, Polish, Swedish, German, French, Spanish
- Training setting: 500 examples per language
- Base model:
openai/privacy-filter
OpenPII labels were mapped to the output label set used by openai/privacy-filter, including person, email, phone, date, address, account number, and secret categories.
Evaluation
The retained final evaluation uses the shared 50 examples per language, 300 rows total.
| Evaluation | Language | Texts | Precision | Recall | F1 |
|---|---|---|---|---|---|
| simple | de | 50 | 0.925 | 0.933 | 0.929 |
| improved | de | 50 | 0.914 | 0.924 | 0.919 |
| simple | en | 50 | 0.958 | 0.964 | 0.961 |
| improved | en | 50 | 0.940 | 0.946 | 0.943 |
| simple | es | 50 | 0.928 | 0.967 | 0.947 |
| improved | es | 50 | 0.902 | 0.940 | 0.921 |
| simple | fr | 50 | 0.969 | 0.946 | 0.957 |
| improved | fr | 50 | 0.944 | 0.918 | 0.931 |
| simple | pl | 50 | 0.888 | 0.925 | 0.906 |
| improved | pl | 50 | 0.859 | 0.892 | 0.875 |
| simple | sv | 50 | 0.900 | 0.928 | 0.914 |
| improved | sv | 50 | 0.867 | 0.893 | 0.880 |
| simple | overall | 300 | 0.926 | 0.942 | 0.934 |
| improved | overall | 300 | 0.901 | 0.917 | 0.909 |
simple counts a prediction as correct when it overlaps a gold PII span. improved also requires the mapped PII category to match.
Usage
from transformers import AutoModelForTokenClassification, AutoTokenizer
model_id = "emiemimi/privacy-filter-multilingual-500"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
Project Code
The group GitHub repository should link directly to this model page and include the fine-tuning and shared-50 evaluation scripts.
- Downloads last month
- 14
Model tree for emiemimi/privacy-filter-multilingual-500
Base model
openai/privacy-filter