ab-ai
/

pii_model_based_on_distilbert

Token Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

ab-ai commited on Mar 16

Commit

5fa4907

•

1 Parent(s): f62ed74

Update README.md

Files changed (1) hide show

README.md +49 -14

README.md CHANGED Viewed

@@ -1,17 +1,52 @@
 ---
-pipeline_tag: token-classification
 tags:
-- PII
-- NER
 ---
-This model is a fine-tuned version of distilbert/distilbert-base-uncased on the generator dataset.
-It achieves the following results on the evaluation set:
-| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 Score | Accuracy |
-|-------|---------------|-----------------|-----------|--------|----------|----------|
-|   1   |    0.093200   |     0.071345    |  0.905912 | 0.933906 | 0.919696 | 0.968861 |
-|   2   |    0.067100   |     0.069066    |  0.912207 | 0.936481 | 0.924185 | 0.969175 |
-|   3   |    0.060000   |     0.063845    |  0.930860 | 0.947639 | 0.939175 | 0.970566 |
-|   4   |    0.043600   |     0.086811    |  0.925738 | 0.941631 | 0.933617 | 0.965496 |
-|   5   |    0.025400   |     0.118688    |  0.931181 | 0.940773 | 0.935952 | 0.965092 |
-|   6   |    0.012400   |     0.154431    |  0.931181 | 0.940773 | 0.935952 | 0.971463 |

 ---
+language: en
 tags:
+- token-classification
+- pii-detection
+license: apache-2.0
+datasets:
+- custom_dataset
 ---
+# Model Name
+PII Detection Model Based on DistilBERT
+## Model description
+This model is a token classification model trained for detecting personally identifiable information (PII) entities such as names, addresses, dates of birth, credit card numbers, etc. The model is based on the DistilBERT architecture and has been fine-tuned on a custom dataset for PII detection.
+## Intended use
+The model is intended to be used for automatically identifying and extracting PII entities from text data. It can be incorporated into data processing pipelines for tasks such as data anonymization, redaction, compliance with privacy regulations, etc.
+## Evaluation results
+The model's performance was evaluated on a held-out validation set using the following metrics:
+- Precision: 0.93
+- Recall: 0.94
+- F1 Score: 0.94
+- Accuracy: 0.97
+## Limitations and bias
+- The model's performance may vary depending on the quality and diversity of the input data.
+- It may exhibit biases present in the training data, such as overrepresentation or underrepresentation of certain demographic groups or types of PII.
+- The model may struggle with detecting PII entities in noisy or poorly formatted text.
+## Ethical considerations
+- Care should be taken when deploying the model in production to ensure that it does not inadvertently expose sensitive information or violate individuals' privacy rights.
+- Data used to train and evaluate the model should be handled with caution to avoid the risk of exposing PII.
+- Regular monitoring and auditing of the model's predictions may be necessary to identify and mitigate any potential biases or errors.
+## Authors
+- Your Name (Your email or contact information)
+## References
+- Link to the GitHub repository or codebase where the model was trained
+- Any relevant papers or resources related to PII detection and token classification