sarahyurick commited on
Commit
7556b38
1 Parent(s): 76375f3

Add "How to Use in Transformers" section

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -75,6 +75,101 @@ Success is defined as having an acceptable catch rate (recall scores for each at
75
  The inference code is available on [NeMo Curator's GitHub repository](https://github.com/NVIDIA/NeMo-Curator). <br>
76
  Check out [this example notebook](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/distributed_data_classification) to get started.
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ## Ethical Considerations:
79
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
80
 
 
75
  The inference code is available on [NeMo Curator's GitHub repository](https://github.com/NVIDIA/NeMo-Curator). <br>
76
  Check out [this example notebook](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/distributed_data_classification) to get started.
77
 
78
+ ## How to Use in Transformers:
79
+ To use this AEGIS classifiers, you must get access to Llama Guard on Hugging Face here: https://huggingface.co/meta-llama/LlamaGuard-7b. Afterwards, you should set up a [user access token](https://huggingface.co/docs/hub/en/security-tokens) and pass that token into the constructor of this classifier.
80
+
81
+ ```python
82
+ import torch
83
+ import torch.nn.functional as F
84
+ from huggingface_hub import hf_hub_download
85
+ from peft import PeftModel
86
+ from safetensors.torch import load_file
87
+ from torch.nn import Dropout, Linear
88
+ from transformers import AutoModelForCausalLM, AutoTokenizer
89
+
90
+ # Initialize model embedded with AEGIS
91
+ pretrained_model_name_or_path = "meta-llama/LlamaGuard-7b"
92
+ dtype = torch.bfloat16
93
+ token = "hf_1234" # Replace with your user access token
94
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
95
+ base_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, torch_dtype=dtype, token=token).to(device)
96
+ peft_model_name_or_path = "nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0"
97
+ model = PeftModel.from_pretrained(base_model, peft_model_name_or_path)
98
+
99
+ # Initialize tokenizer
100
+ tokenizer = AutoTokenizer.from_pretrained(
101
+ pretrained_model_name_or_path=pretrained_model_name_or_path,
102
+ padding_side="left"
103
+ )
104
+ tokenizer.pad_token = tokenizer.unk_token
105
+
106
+ class InstructionDataGuardNet(torch.nn.Module):
107
+ def __init__(self, input_dim, dropout=0.7):
108
+ super().__init__()
109
+ self.input_dim = input_dim
110
+ self.dropout = Dropout(dropout)
111
+ self.sigmoid = torch.nn.Sigmoid()
112
+ self.input_layer = Linear(input_dim, input_dim)
113
+
114
+ self.hidden_layer_0 = Linear(input_dim, 2000)
115
+ self.hidden_layer_1 = Linear(2000, 500)
116
+ self.hidden_layer_2 = Linear(500, 1)
117
+
118
+ def forward(self, x):
119
+ x = torch.nn.functional.normalize(x, dim=-1)
120
+ x = self.dropout(x)
121
+ x = F.relu(self.input_layer(x))
122
+ x = self.dropout(x)
123
+ x = F.relu(self.hidden_layer_0(x))
124
+ x = self.dropout(x)
125
+ x = F.relu(self.hidden_layer_1(x))
126
+ x = self.dropout(x)
127
+ x = self.hidden_layer_2(x)
128
+ x = self.sigmoid(x)
129
+ return x
130
+
131
+ # Load Instruction-Data-Guard classifier
132
+ instruction_data_guard = InstructionDataGuardNet(4096).to(device)
133
+ weights_path = hf_hub_download(
134
+ repo_id="nvidia/instruction-data-guard",
135
+ filename="model.safetensors",
136
+ )
137
+ state_dict = load_file(weights_path)
138
+ instruction_data_guard.load_state_dict(state_dict)
139
+ instruction_data_guard = instruction_data_guard.eval()
140
+
141
+ # Function to compute results
142
+ def get_instruction_data_guard_results(
143
+ prompts,
144
+ tokenizer,
145
+ model,
146
+ instruction_data_guard,
147
+ device="cuda",
148
+ ):
149
+ input_ids = tokenizer(prompts, padding=True, return_tensors="pt").to(device)
150
+ outputs = model.generate(
151
+ **input_ids,
152
+ output_hidden_states=True,
153
+ return_dict_in_generate=True,
154
+ max_new_tokens=1,
155
+ pad_token_id=0,
156
+ )
157
+ input_tensor = outputs.hidden_states[0][32][:, -1,:].to(torch.float)
158
+ return instruction_data_guard(input_tensor).flatten().detach().cpu().numpy()
159
+
160
+ # Prepare sample input
161
+ instruction = "Find a route between San Diego and Phoenix which passes through Nevada"
162
+ input_ = ""
163
+ response = "Drive to Las Vegas with highway 15 and from there drive to Phoenix with highway 93"
164
+ benign_sample = f"Instruction: {instruction}. Input: {input_}. Response: {response}."
165
+ text_samples = [benign_sample]
166
+ poisoning_scores = get_instruction_data_guard_results(
167
+ text_samples, tokenizer, model, instruction_data_guard
168
+ )
169
+ print(poisoning_scores)
170
+ # [0.01149639]
171
+ ```
172
+
173
  ## Ethical Considerations:
174
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
175