Isotonic
/

deberta-v3-base_finetuned_ai4privacy_v2

@@ -6,6 +6,14 @@ tags:
 model-index:
 - name: deberta-v3-base_finetuned_ai4privacy_v2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,73 +21,20 @@ should probably proofread and complete it, then remove this comment. -->
 # deberta-v3-base_finetuned_ai4privacy_v2
-This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0693
-- Overall Precision: 0.9664
-- Overall Recall: 0.9732
-- Overall F1: 0.9698
-- Overall Accuracy: 0.9728
-- Accountname F1: 1.0
-- Accountnumber F1: 1.0
-- Age F1: 0.9760
-- Amount F1: 0.9897
-- Bic F1: 0.9978
-- Bitcoinaddress F1: 0.9907
-- Buildingnumber F1: 0.9906
-- City F1: 0.9930
-- Companyname F1: 0.9994
-- County F1: 0.9939
-- Creditcardcvv F1: 1.0
-- Creditcardissuer F1: 0.9891
-- Creditcardnumber F1: 0.9590
-- Currency F1: 0.9052
-- Currencycode F1: 0.9875
-- Currencyname F1: 0.7022
-- Currencysymbol F1: 0.9892
-- Date F1: 0.9126
-- Dob F1: 0.7438
-- Email F1: 1.0
-- Ethereumaddress F1: 1.0
-- Eyecolor F1: 1.0
-- Firstname F1: 0.9934
-- Gender F1: 0.9991
-- Height F1: 1.0
-- Iban F1: 1.0
-- Ip F1: 0.1551
-- Ipv4 F1: 0.8393
-- Ipv6 F1: 0.8034
-- Jobarea F1: 0.9942
-- Jobtitle F1: 0.9993
-- Jobtype F1: 0.9928
-- Lastname F1: 0.9877
-- Litecoinaddress F1: 0.9770
-- Mac F1: 1.0
-- Maskednumber F1: 0.9451
-- Middlename F1: 0.9773
-- Nearbygpscoordinate F1: 1.0
-- Ordinaldirection F1: 0.9924
-- Password F1: 1.0
-- Phoneimei F1: 1.0
-- Phonenumber F1: 1.0
-- Pin F1: 0.9929
-- Prefix F1: 0.9722
-- Secondaryaddress F1: 0.9974
-- Sex F1: 0.9949
-- Ssn F1: 0.9970
-- State F1: 0.9941
-- Street F1: 0.9972
-- Time F1: 0.9967
-- Url F1: 1.0
-- Useragent F1: 1.0
-- Username F1: 0.9991
-- Vehiclevin F1: 1.0
-- Vehiclevrm F1: 1.0
-- Zipcode F1: 0.9890
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -89,19 +44,84 @@ More information needed
 More information needed
-## Training procedure
-### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 4
-- eval_batch_size: 4
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine_with_restarts
-- lr_scheduler_warmup_ratio: 0.2
 - num_epochs: 7
 ### Training results
@@ -115,10 +135,9 @@ The following hyperparameters were used during training:
 | 0.0808        | 6.0   | 14358 | 0.0693          | 0.9664            | 0.9732         | 0.9698     | 0.9728           | 1.0            | 1.0              | 0.9760 | 0.9897    | 0.9978 | 0.9907            | 0.9906            | 0.9930  | 0.9994         | 0.9939    | 1.0              | 0.9891              | 0.9590              | 0.9052      | 0.9875          | 0.7022          | 0.9892            | 0.9126  | 0.7438 | 1.0      | 1.0                | 1.0         | 0.9934       | 0.9991    | 1.0       | 1.0     | 0.1551 | 0.8393  | 0.8034  | 0.9942     | 0.9993      | 0.9928     | 0.9877      | 0.9770             | 1.0    | 0.9451          | 0.9773        | 1.0                    | 0.9924              | 1.0         | 1.0          | 1.0            | 0.9929 | 0.9722    | 0.9974              | 0.9949 | 0.9970 | 0.9941   | 0.9972    | 0.9967  | 1.0    | 1.0          | 0.9991      | 1.0           | 1.0           | 0.9890     |
 | 0.0779        | 7.0   | 16751 | 0.0697          | 0.9698            | 0.9756         | 0.9727     | 0.9739           | 0.9983         | 1.0              | 0.9815 | 0.9904    | 1.0    | 0.9938            | 0.9935            | 0.9930  | 0.9994         | 0.9935    | 1.0              | 0.9903              | 0.9584              | 0.9206      | 0.9917          | 0.7753          | 0.9914            | 0.9315  | 0.8305 | 1.0      | 1.0                | 1.0         | 0.9939       | 1.0       | 1.0       | 1.0     | 0.1404 | 0.8382  | 0.8029  | 0.9958     | 1.0         | 0.9944     | 0.9910      | 0.9875             | 1.0    | 0.9480          | 0.9788        | 1.0                    | 0.9924              | 1.0         | 1.0          | 1.0            | 0.9929 | 0.9747    | 0.9961              | 0.9949 | 0.9970 | 0.9925   | 0.9983    | 0.9967  | 1.0    | 1.0          | 0.9991      | 1.0           | 1.0           | 0.9953     |
 ### Framework versions
 - Transformers 4.35.2
-- Pytorch 2.1.0+cu121
 - Datasets 2.15.0
-- Tokenizers 0.15.0

 model-index:
 - name: deberta-v3-base_finetuned_ai4privacy_v2
   results: []
+datasets:
+- ai4privacy/pii-masking-200k
+- Isotonic/pii-masking-200k
+language:
+- en
+metrics:
+- seqeval
+pipeline_tag: token-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # deberta-v3-base_finetuned_ai4privacy_v2
+This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the [ai4privacy/pii-masking-200k](https://huggingface.co/ai4privacy/pii-masking-200k) dataset.
+## Useage
+GitHub Implementation: [Ai4Privacy](https://github.com/Sripaad/ai4privacy)
 ## Model description
+This model has been finetuned on the World's largest open source privacy dataset.
+The purpose of the trained models is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs.
+The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion subjects / use cases split across business, education, psychology and legal fields, and 5 interactions styles (e.g. casual conversation, formal document, emails etc...).
+Take a look at the Github implementation for specific reasearch.
 ## Intended uses & limitations
 More information needed
+## Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 6e-04
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 412
+- optimizer: Adam with betas=(0.96,0.996) and epsilon=1e-08
 - lr_scheduler_type: cosine_with_restarts
+- lr_scheduler_warmup_ratio: 0.22
 - num_epochs: 7
+- mixed_precision_training: N/A
+## Class wise metrics
+It achieves the following results on the evaluation set:
+- Loss: 0.0211
+- Overall Precision: 0.9722
+- Overall Recall: 0.9792
+- Overall F1: 0.9757
+- Overall Accuracy: 0.9915
+- Accountname F1: 0.9993
+- Accountnumber F1: 0.9986
+- Age F1: 0.9884
+- Amount F1: 0.9984
+- Bic F1: 0.9942
+- Bitcoinaddress F1: 0.9974
+- Buildingnumber F1: 0.9898
+- City F1: 1.0
+- Companyname F1: 1.0
+- County F1: 0.9976
+- Creditcardcvv F1: 0.9541
+- Creditcardissuer F1: 0.9970
+- Creditcardnumber F1: 0.9754
+- Currency F1: 0.8966
+- Currencycode F1: 0.9946
+- Currencyname F1: 0.7697
+- Currencysymbol F1: 0.9958
+- Date F1: 0.9778
+- Dob F1: 0.9546
+- Email F1: 1.0
+- Ethereumaddress F1: 1.0
+- Eyecolor F1: 0.9925
+- Firstname F1: 0.9947
+- Gender F1: 1.0
+- Height F1: 1.0
+- Iban F1: 0.9978
+- Ip F1: 0.5404
+- Ipv4 F1: 0.8455
+- Ipv6 F1: 0.8855
+- Jobarea F1: 0.9091
+- Jobtitle F1: 1.0
+- Jobtype F1: 0.9672
+- Lastname F1: 0.9855
+- Litecoinaddress F1: 0.9949
+- Mac F1: 0.9965
+- Maskednumber F1: 0.9836
+- Middlename F1: 0.7385
+- Nearbygpscoordinate F1: 1.0
+- Ordinaldirection F1: 1.0
+- Password F1: 1.0
+- Phoneimei F1: 0.9978
+- Phonenumber F1: 0.9975
+- Pin F1: 0.9820
+- Prefix F1: 0.9872
+- Secondaryaddress F1: 1.0
+- Sex F1: 0.9916
+- Ssn F1: 0.9960
+- State F1: 0.9967
+- Street F1: 0.9991
+- Time F1: 1.0
+- Url F1: 1.0
+- Useragent F1: 0.9981
+- Username F1: 1.0
+- Vehiclevin F1: 0.9950
+- Vehiclevrm F1: 0.9870
+- Zipcode F1: 0.9966
 ### Training results
 | 0.0808        | 6.0   | 14358 | 0.0693          | 0.9664            | 0.9732         | 0.9698     | 0.9728           | 1.0            | 1.0              | 0.9760 | 0.9897    | 0.9978 | 0.9907            | 0.9906            | 0.9930  | 0.9994         | 0.9939    | 1.0              | 0.9891              | 0.9590              | 0.9052      | 0.9875          | 0.7022          | 0.9892            | 0.9126  | 0.7438 | 1.0      | 1.0                | 1.0         | 0.9934       | 0.9991    | 1.0       | 1.0     | 0.1551 | 0.8393  | 0.8034  | 0.9942     | 0.9993      | 0.9928     | 0.9877      | 0.9770             | 1.0    | 0.9451          | 0.9773        | 1.0                    | 0.9924              | 1.0         | 1.0          | 1.0            | 0.9929 | 0.9722    | 0.9974              | 0.9949 | 0.9970 | 0.9941   | 0.9972    | 0.9967  | 1.0    | 1.0          | 0.9991      | 1.0           | 1.0           | 0.9890     |
 | 0.0779        | 7.0   | 16751 | 0.0697          | 0.9698            | 0.9756         | 0.9727     | 0.9739           | 0.9983         | 1.0              | 0.9815 | 0.9904    | 1.0    | 0.9938            | 0.9935            | 0.9930  | 0.9994         | 0.9935    | 1.0              | 0.9903              | 0.9584              | 0.9206      | 0.9917          | 0.7753          | 0.9914            | 0.9315  | 0.8305 | 1.0      | 1.0                | 1.0         | 0.9939       | 1.0       | 1.0       | 1.0     | 0.1404 | 0.8382  | 0.8029  | 0.9958     | 1.0         | 0.9944     | 0.9910      | 0.9875             | 1.0    | 0.9480          | 0.9788        | 1.0                    | 0.9924              | 1.0         | 1.0          | 1.0            | 0.9929 | 0.9747    | 0.9961              | 0.9949 | 0.9970 | 0.9925   | 0.9983    | 0.9967  | 1.0    | 1.0          | 0.9991      | 1.0           | 1.0           | 0.9953     |
 ### Framework versions
 - Transformers 4.35.2
+- Pytorch 2.1.0+cu118
 - Datasets 2.15.0
+- Tokenizers 0.15.0