metadata

license: apache-2.0
base_model: distilbert-base-cased
tags:
  - generated_from_trainer
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: distilbert-base-cased-pii-en
    results: []

distilbert-base-cased-pii-en

This model is a fine-tuned version of distilbert-base-cased on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0412
Bod F1: 0.9572
Building F1: 0.9765
Cardissuer F1: 0.0
City F1: 0.9467
Country F1: 0.9664
Date F1: 0.9008
Driverlicense F1: 0.9304
Email F1: 0.9844
Geocoord F1: 0.9655
Givenname1 F1: 0.8097
Givenname2 F1: 0.5922
Idcard F1: 0.9202
Ip F1: 0.9807
Lastname1 F1: 0.7518
Lastname2 F1: 0.4932
Lastname3 F1: 0.0948
Pass F1: 0.8835
Passport F1: 0.9392
Postcode F1: 0.9766
Secaddress F1: 0.9749
Sex F1: 0.9687
Socialnumber F1: 0.9334
State F1: 0.9744
Street F1: 0.9534
Tel F1: 0.9553
Time F1: 0.9619
Title F1: 0.9502
Username F1: 0.9495
Precision: 0.9163
Recall: 0.9342
F1: 0.9252
Accuracy: 0.9903

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 64
eval_batch_size: 128
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.2
lr_scheduler_warmup_steps: 3000
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Bod F1	Building F1	City F1	Country F1	Date F1	Driverlicense F1	Email F1	Geocoord F1	Givenname1 F1	Givenname2 F1	Idcard F1	Ip F1	Lastname1 F1	Lastname2 F1	Lastname3 F1	Pass F1	Passport F1	Postcode F1	Secaddress F1	Sex F1	Socialnumber F1	State F1	Street F1	Tel F1	Time F1	Title F1	Username F1	Precision	Recall	F1	Accuracy
0.2231	2.1368	1000	0.1075	0.8895	0.9243	0.6385	0.8816	0.7987	0.6178	0.9512	0.6982	0.4720	0.0	0.5863	0.9082	0.5397	0.0	0.0	0.6402	0.6167	0.7858	0.6568	0.8626	0.7003	0.8859	0.6843	0.8146	0.9158	0.7302	0.8258	0.7239	0.7677	0.7452	0.9739
0.069	4.2735	2000	0.0540	0.9478	0.9698	0.9055	0.9433	0.8854	0.8801	0.9783	0.9676	0.7201	0.2896	0.8815	0.9731	0.6380	0.1939	0.0	0.8266	0.8883	0.9592	0.9645	0.9370	0.8931	0.9390	0.9237	0.9386	0.9455	0.9087	0.9195	0.8707	0.9044	0.8872	0.9865
0.0447	6.4103	3000	0.0455	0.9537	0.9756	0.9327	0.9593	0.9007	0.9030	0.9792	0.9633	0.7860	0.4337	0.9056	0.9747	0.7205	0.3587	0.0	0.8557	0.9144	0.9712	0.9732	0.9661	0.9204	0.9689	0.9426	0.9552	0.9588	0.9374	0.9413	0.9011	0.9232	0.9120	0.9887
0.0293	8.5470	4000	0.0412	0.9572	0.9765	0.9467	0.9664	0.9008	0.9304	0.9844	0.9655	0.8097	0.5922	0.9202	0.9807	0.7518	0.4932	0.0948	0.8835	0.9392	0.9766	0.9749	0.9687	0.9334	0.9744	0.9534	0.9553	0.9619	0.9502	0.9495	0.9163	0.9342	0.9252	0.9903

Framework versions

Transformers 4.41.2
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1