ab-ai commited on
Commit
5fa4907
1 Parent(s): f62ed74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -14
README.md CHANGED
@@ -1,17 +1,52 @@
1
  ---
2
- pipeline_tag: token-classification
3
  tags:
4
- - PII
5
- - NER
 
 
 
6
  ---
7
- This model is a fine-tuned version of distilbert/distilbert-base-uncased on the generator dataset.
8
- It achieves the following results on the evaluation set:
9
-
10
- | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 Score | Accuracy |
11
- |-------|---------------|-----------------|-----------|--------|----------|----------|
12
- | 1 | 0.093200 | 0.071345 | 0.905912 | 0.933906 | 0.919696 | 0.968861 |
13
- | 2 | 0.067100 | 0.069066 | 0.912207 | 0.936481 | 0.924185 | 0.969175 |
14
- | 3 | 0.060000 | 0.063845 | 0.930860 | 0.947639 | 0.939175 | 0.970566 |
15
- | 4 | 0.043600 | 0.086811 | 0.925738 | 0.941631 | 0.933617 | 0.965496 |
16
- | 5 | 0.025400 | 0.118688 | 0.931181 | 0.940773 | 0.935952 | 0.965092 |
17
- | 6 | 0.012400 | 0.154431 | 0.931181 | 0.940773 | 0.935952 | 0.971463 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  tags:
4
+ - token-classification
5
+ - pii-detection
6
+ license: apache-2.0
7
+ datasets:
8
+ - custom_dataset
9
  ---
10
+
11
+ # Model Name
12
+
13
+ PII Detection Model Based on DistilBERT
14
+
15
+ ## Model description
16
+
17
+ This model is a token classification model trained for detecting personally identifiable information (PII) entities such as names, addresses, dates of birth, credit card numbers, etc. The model is based on the DistilBERT architecture and has been fine-tuned on a custom dataset for PII detection.
18
+
19
+ ## Intended use
20
+
21
+ The model is intended to be used for automatically identifying and extracting PII entities from text data. It can be incorporated into data processing pipelines for tasks such as data anonymization, redaction, compliance with privacy regulations, etc.
22
+
23
+ ## Evaluation results
24
+
25
+ The model's performance was evaluated on a held-out validation set using the following metrics:
26
+
27
+ - Precision: 0.93
28
+ - Recall: 0.94
29
+ - F1 Score: 0.94
30
+ - Accuracy: 0.97
31
+
32
+ ## Limitations and bias
33
+
34
+ - The model's performance may vary depending on the quality and diversity of the input data.
35
+ - It may exhibit biases present in the training data, such as overrepresentation or underrepresentation of certain demographic groups or types of PII.
36
+ - The model may struggle with detecting PII entities in noisy or poorly formatted text.
37
+
38
+ ## Ethical considerations
39
+
40
+ - Care should be taken when deploying the model in production to ensure that it does not inadvertently expose sensitive information or violate individuals' privacy rights.
41
+ - Data used to train and evaluate the model should be handled with caution to avoid the risk of exposing PII.
42
+ - Regular monitoring and auditing of the model's predictions may be necessary to identify and mitigate any potential biases or errors.
43
+
44
+ ## Authors
45
+
46
+ - Your Name (Your email or contact information)
47
+
48
+ ## References
49
+
50
+ - Link to the GitHub repository or codebase where the model was trained
51
+ - Any relevant papers or resources related to PII detection and token classification
52
+