Business Safety Classifier - for demo purpose only

Please read carefully the disclaimers below before downloading and using this model!

Model Description

This is a logistic regression model that was developed by Intel to demonstrate possibility of training such a light-weight model to classify if a piece of text contains business sensitive information or not. You can refer to the OPEA guardrail microservice webpage to learn more about the demo deployment of such a model in a guardrail microservice as part of a GenAI application.

Developed by: Intel
Model type: logistic regression classifier in pickled format
License: [To be discussed with BU Legal]

Training Details

Dataset: Patronus EnterprisePII dataset,
Dataset preprocessing: get the text and golden labels from the orginal dataset.
Embedding model: nomic-ai/nomic-embed-text-v1. The embedding model was used as-is without any fine-tuning.
Annotation LLM: mistralai/Mixtral-8x7B-Instruct-v0.1. The LLM was used as-is without any fine tuning.
Dataset annotation: used Annotation LLM to generate labels for the samples in the dataset. The label is 1 if LLM denotes that the text contains business sensitive info, else label is 0.
The LLM annotation accuracy with respect to the golden labels is shown in the Evaluation section below. The reason for LLM annotation is that we want to demo the feasibility of using LLMs to generate high-quality labels in potentail use cases where there is no labeled text for training. Note: the LLM annotations have not been validated by human experts, instead we compared the LLM-annotated labels with the golden labels provided by the original dataset and observed good precision/recall.
Training process: 1) split the dataset into train/test sets (test is about 10% of the total data). 2) Embed the training data with the embedding model. 3) Feed the embeddings into the logistic regresstion (LR) classifier. Use the LLM-annotated labels in the dataset to train the LR classifier from scratch.

Evaluation

LLM annotation accuracy (entire dataset)

The LLM annotation accuracy was evaluated on the entire Patronus EnterprisePII dataset. We calculated annotation accuracy with respect to the golden labels in the dataset. Below are the metrics that we collected when we conducted the annotation runs.

Metric	Value
Accuracy	0.909
Precision	0.883
Recall	0.940

LR classifier accuracy (test split)

We evaluated the LR classifier accuracy on our test split of the Patronus EnterprisePII dataset, which has no overlap with the training split. The metrics on the test set are shown below. Interestingly, although the classifier was trained with LLM-annotated labels, the classifier performed perfectly on the 300 test samples when using the golden labels in the original dataset as the reference, while it achieves slighlty lower but still very good accuracy (around 0.9) when using the LLM annotations (which the classifier was trained on) as reference. This shows that the LR classifier did not overfit to the LLM-annotated labels.

	Accuracy	Precision	Recall
Compared to golden labels	1.0	1.0	1.0
Compared to LLM annotated labels	0.903	0.927	0.886

Important Notices and Disclaimers

The accuracy, precision, and recall metrics obtained for this reference implementation should not be seen as a goal or threshold for applied implementations, or as a judgement for what adequate performance ought to be. Each applied implementation ought to determine its own performance thresholds prior to deployment.
The types of sensitive information contained the Patronus EnterprisePII dataset are not exhaustive and may not container certain types of sensitive information that are important for your applications. Therefore, the LR classifier trained with Patronus EnterprisePII dataset may not give satisfactory detection accuracy/precision/recall for your applications.
The model does not support any language other than English.
This model is served as a demo model for further testing and developing classifiers to detect the presence of business sensitive information, including personally identifying information (PII).
This model is intended to allow users to examine and evaluate the model and the associated Intel Confidential performance of Intel technology solutions. The accuracy of computer models is a function of the relation between the data used to train them and the data that the models encounter after eployment. This model has been tested using datasets that may or may not be sufficient for use in production applications. Accordingly, while the model may serve as a strong foundation, Intel recommends and requests that this model be tested against data. the model is likely to encounter in specific deployments.
There is no publicly available fairness metrics for the models and datasets that served as inputs for this model. Further testing is needed to demonstrate whether there are disparities in whether PII is equally successfully identified across different demographic groups.
This model should not be used without further testing, or without human oversight and review of the outputs to ensure PII and other sensitive items are fully removed. This model should not be used in situations where the consequences of inaccuracy are high. It is not appropriate to use this model as part of any investigation of employee conduct.
Human Rights Disclaimer: Intel is committed to respecting human rights and avoiding causing or directly contributing to adverse impacts on human rights. See Intel’s Global Human Rights Policy. The [software or model] licensed from Intel is intended for socially responsible applications and should not be used to cause or contribute to a violation of internationally recognized human rights.