---
license: apache-2.0
widget:
- text: "Check for sensitive information: Dear Team, I wanted to update you all about our new client. We've signed a contract with Huggingsplace Corp. and we're extremely excited to start working with them. They'll be investing about $30 million into our product and we hope to see a substantial rise in our revenue. Our new client has specific requirements, their confidential project details, which contains proprietary technology are as follows "
---

# DataPrivacyComplianceCheck-3B-V0.9

# License
This Natural Language Processing (NLP) model is made available under the Apache License, Version 2.0. You are free to use, modify, and distribute this software according to the terms and conditions of the Apache 2.0 License. For the full license text, please refer to the Apache 2.0 License.
# Usage and Specific Capabilities
## Text Length Limitation
The model is optimized to analyze texts containing up to 512 tokens. If your text exceeds this limit, we recommend splitting it into smaller chunks, each containing no more than 512 tokens. Each chunk can then be processed separately.
## Supported Languages
Bulgarian, Chinese, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Indonesian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish


# Use Cases
## Data Privacy and Compliance
This model is designed to screen for sensitive data and trade secrets in text. By doing so, it helps organizations remain compliant with data privacy laws and reduces the risk of accidental exposure of confidential information.
# Example Usage
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("metricspace/DataPrivacyComplianceCheck-3B-V0.9")
model = AutoModelForCausalLM.from_pretrained("metricspace/DataPrivacyComplianceCheck-3B-V0.9", torch_dtype=torch.bfloat16)

text_to_check = 'John, our patient, felt a throbbing headache and dizziness for two weeks. He was immediately...'

prompt = f"Check for sensitive information: {text_to_check}"
inputs = tokenizer(prompt, return_tensors='pt').to('cuda')

max_length = 512
outputs = model.generate(inputs.input_ids, max_new_tokens=max_length)

result = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]

print(result)
```
…
# Dataset and Training Documentation for Audit
If you require the original dataset used for training this model, or further documentation related to its training and architecture for audit purposes, you can request this information by contacting us.
Further Tuning Services for Custom Use Cases
For specialized needs or custom use cases, we offer further tuning services to adapt the model to your specific requirements. To inquire about these services, please reach out to us at:
📧 Email: info@metric-space.ai
Please note that the availability of the dataset, additional documentation, and tuning services may be subject to certain conditions and limitations.