Update README.md
Browse files
README.md
CHANGED
@@ -5,43 +5,24 @@ language:
|
|
5 |
- en
|
6 |
tags:
|
7 |
- promptinjection
|
|
|
8 |
---
|
9 |
-
|
10 |
-
# Model Card for LLM-Prompt-Injection-Detection
|
11 |
|
12 |
## Model Description
|
13 |
-
This model
|
14 |
|
15 |
-
##
|
16 |
-
|
17 |
|
18 |
## Training Data
|
19 |
-
The
|
20 |
-
|
21 |
-
## Model Architecture
|
22 |
-
The model leverages the `bert-base-uncased` architecture, which has been proven effective for a wide range of natural language processing tasks. This choice provides a solid foundation for understanding the nuanced patterns that differentiate regular text from prompt injections.
|
23 |
-
|
24 |
-
## Training Procedure
|
25 |
-
Training was conducted on a split of 80% of the dataset for training and 20% for testing, ensuring a comprehensive learning process. The model was trained for 5 epochs, using the Adam optimizer with a learning rate of 0.0001. Training utilized a batch size of 16 for both training and evaluation phases. Performance was monitored through accuracy, precision, recall, and F1 score, allowing for a detailed understanding of the model's capabilities.
|
26 |
|
27 |
-
##
|
28 |
-
|
29 |
-
|
30 |
-
## Limitations and Considerations
|
31 |
-
While the model provides a valuable tool for detecting prompt injection attacks, it is not infallible. Users should be aware of the potential for false positives and false negatives. The model's performance can vary based on the specific characteristics of the input data and the complexity of potential attacks. It is recommended to use this model as part of a comprehensive security strategy that includes manual review and other automated checks.
|
32 |
-
|
33 |
-
## How to Use
|
34 |
-
This model can be loaded and used for inference using the Hugging Face Transformers library. Example code for loading the model:
|
35 |
-
|
36 |
-
```python
|
37 |
-
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
38 |
|
39 |
-
|
40 |
-
model
|
41 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
42 |
|
43 |
-
|
44 |
-
|
45 |
-
inputs = tokenizer(text, return_tensors="pt")
|
46 |
-
outputs = model(**inputs)
|
47 |
-
prediction = outputs.logits.argmax(-1)
|
|
|
5 |
- en
|
6 |
tags:
|
7 |
- promptinjection
|
8 |
+
- distilbert
|
9 |
---
|
10 |
+
# Model Card for DistilBERT-PromptInjectionDetectorForCVs
|
|
|
11 |
|
12 |
## Model Description
|
13 |
+
This DistilBERT-based model was developed as part of a research project aiming to mitigate prompt injection attacks in applications processing CVs. It specifically targets the nuanced domain of CV submissions, demonstrating a strategy to distinguish between legitimate CVs and those containing prompt injection attempts.
|
14 |
|
15 |
+
## Research and Application Context
|
16 |
+
The model was created in the context of demonstrating a synthetic application handling CVs, showcasing a domain-specific approach to mitigate prompt injection attacks. This work, including the model and its underlying strategy, is detailed in our [research blog](http://placeholder) and the synthetic application can be accessed [here](http://placeholder).
|
17 |
|
18 |
## Training Data
|
19 |
+
The model was fine-tuned on a custom dataset that combines domain-specific examples (legitimate CVs) with prompt injection examples to create a more tailored dataset. This dataset includes legitimate CVs, pure prompt injection texts, and CVs with embedded prompt injection attempts. The original datasets used are available on Hugging Face: [Resume Dataset](https://huggingface.co/datasets/Lakshmi12/Resume_Dataset) for CVs and [Prompt Injections](https://huggingface.co/datasets/deepset/prompt-injections) for injection examples.
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
+
## Intended Use
|
22 |
+
This model is not intended for production use but serves as a demonstration of a domain-specific strategy to mitigate prompt injection attacks. It should be employed as part of a broader security strategy, including securing the model's output, as described in our article. This approach is meant to showcase how to address prompt injection risks in a targeted application scenario.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
+
## Limitations and Ethical Considerations
|
25 |
+
Prompt injection in Large Language Models (LLMs) remains an open problem with no deterministic solution. While this model offers a mitigation strategy, it's important to understand that new ways to perform injection attacks may still be possible. Users should consider this model as an example of how to approach mitigation in a specific domain, rather than a definitive solution.
|
|
|
26 |
|
27 |
+
## License and Usage
|
28 |
+
The model and datasets are shared for research purposes, encouraging further exploration and development of mitigation strategies against prompt injection attacks. Users are encouraged to refer to the specific licenses of the datasets and the model for more details on permissible use cases.
|
|
|
|
|
|