Update README.md
Browse files
README.md
CHANGED
@@ -176,23 +176,122 @@ Looking forward to your direction on this matter.
|
|
176 |
|
177 |
|
178 |
|
|
|
|
|
|
|
179 |
# Example: Process anonymized version with GPT4 and change enteties back
|
180 |
```python
|
181 |
import torch
|
|
|
|
|
182 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
183 |
tokenizer = AutoTokenizer.from_pretrained("metricspace/EntityAnonymization-3B-V0.9")
|
184 |
model = AutoModelForCausalLM.from_pretrained("metricspace/EntityAnonymization-3B-V0.9", torch_dtype=torch.bfloat16)
|
|
|
185 |
|
186 |
-
import re
|
187 |
|
188 |
-
|
189 |
-
|
190 |
-
match = re.search(r'ASSISTANT:', input_text)
|
191 |
|
192 |
-
|
193 |
-
|
194 |
-
|
195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
|
197 |
import ast
|
198 |
|
@@ -208,59 +307,57 @@ def swap_keys_and_values_in_string(input_str):
|
|
208 |
|
209 |
return swapped_str
|
210 |
|
211 |
-
# sample text for entitity extraction and resampling
|
212 |
|
213 |
-
|
|
|
|
|
|
|
|
|
|
|
214 |
|
215 |
-
|
|
|
|
|
216 |
|
217 |
-
|
218 |
|
219 |
-
We
|
220 |
-
'''
|
221 |
|
|
|
222 |
|
223 |
-
|
224 |
|
225 |
-
|
226 |
|
227 |
-
|
228 |
|
229 |
-
|
230 |
|
231 |
-
|
232 |
|
|
|
233 |
|
234 |
-
|
235 |
-
inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
|
236 |
-
outputs = model.generate(inputs.input_ids, max_new_tokens=250, do_sample=False, top_k=50, top_p=0.98, num_beams=1)
|
237 |
-
output_text_1 = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
238 |
|
|
|
239 |
|
240 |
-
|
241 |
|
242 |
-
|
243 |
-
# {'XYZ Biotech': 'ABC Pharmaceuticals', 'CipherGuard': 'CodeGuard', 'BioShield': 'BioProtect', 'SecureLabs': 'SecureTech', 'BioDiscover': 'BioDiscover'}
|
244 |
-
# inverted to this:
|
245 |
-
# {'ABC Pharmaceuticals': 'XYZ Biotech', 'CodeGuard': 'CipherGuard', 'BioProtect': 'BioShield', 'SecureTech': 'SecureLabs', 'BioDiscover': 'BioDiscover'}
|
246 |
|
247 |
-
|
248 |
|
|
|
249 |
|
250 |
-
|
251 |
-
inputs = tokenizer(prompt_2, return_tensors='pt').to('cuda')
|
252 |
-
outputs = model.generate(inputs.input_ids, max_new_tokens=500, do_sample=False, top_k=50, top_p=0.98)
|
253 |
-
output_text_2 = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
254 |
|
255 |
-
|
256 |
|
|
|
257 |
'''
|
258 |
-
|
259 |
-
This comprehensive initiative goes beyond regulatory compliance and entails the implementation of three crucial security measures. We will be leveraging the cutting-edge encryption technology provided by CipherGuard to secure our research data, implementing BioShield's advanced security protocols for laboratory access, and integrating the real-time data monitoring and threat detection systems offered by SecureLabs.
|
260 |
-
The enhancement of our regulatory affairs and data security measures is a pivotal component in safeguarding our proprietary research, reinforcing our commitment to excellence in drug development. While we prioritize compliance and data protection, the user experience for our research teams and partners will remain user-friendly and efficient, whether they are using our proprietary research software, "BioDiscover," or our mobile applications.
|
261 |
-
We are actively seeking a regulatory affairs specialist who comprehends the critical importance of upholding compliance and data security in our industry and possesses the expertise to deliver a comprehensive and impervious solution that ensures not only our adherence to regulations but also preserves the confidentiality of our research data at XYZ Biotech.
|
262 |
-
'''
|
263 |
```
|
|
|
264 |
…
|
265 |
# Dataset and Training Documentation for Audit
|
266 |
If you require the original dataset used for training this model, or further documentation related to its training and architecture for audit purposes, you can request this information by contacting us.
|
|
|
176 |
|
177 |
|
178 |
|
179 |
+
|
180 |
+
|
181 |
+
|
182 |
# Example: Process anonymized version with GPT4 and change enteties back
|
183 |
```python
|
184 |
import torch
|
185 |
+
import json
|
186 |
+
import re
|
187 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
188 |
+
|
189 |
tokenizer = AutoTokenizer.from_pretrained("metricspace/EntityAnonymization-3B-V0.9")
|
190 |
model = AutoModelForCausalLM.from_pretrained("metricspace/EntityAnonymization-3B-V0.9", torch_dtype=torch.bfloat16)
|
191 |
+
model.to("cuda:0")
|
192 |
|
|
|
193 |
|
194 |
+
# Anonymized input
|
195 |
+
anonymized_text = '''Subject: HR Incident Report: Speculation of Drug Misuse by Mr. Edward Martin
|
|
|
196 |
|
197 |
+
Dear Mrs. Charlotte Johnson,
|
198 |
+
|
199 |
+
I trust you're well. I wish to bring to your attention a concerning matter involving one of our esteemed employees, Mr. Edward Martin.
|
200 |
+
|
201 |
+
Employee Details:
|
202 |
+
|
203 |
+
Name: Edward Martin
|
204 |
+
Position: Senior Marketing Creative
|
205 |
+
Department: Marketing
|
206 |
+
Date of Joining: January 15, 2020
|
207 |
+
Reporting Manager: Mrs. Jane Anderson
|
208 |
+
|
209 |
+
Incident Details:
|
210 |
+
Date: October 25, 2023
|
211 |
+
Location: Restroom, topmost floor
|
212 |
+
Time: midday
|
213 |
+
|
214 |
+
Description of Incident:
|
215 |
+
On the date specified, a few colleagues reported unusual behavior exhibited by Mr. Martin, which raised concerns about potential drug misuse. Witnesses mentioned that Edward appeared disoriented and was found in the restroom for an extended period. Some employees also discovered unidentified pills in close proximity to his chair.
|
216 |
+
|
217 |
+
Witness Accounts:
|
218 |
+
Ms. Marie Foster: "Edward seemed distracted and not his usual self today. He's been taking frequent breaks and appears a bit disoriented."
|
219 |
+
Mr. Benjamin Adams: "I found some pills near his chair on the floor. It's concerning, and I felt it necessary to report."
|
220 |
+
|
221 |
+
Immediate Actions Taken:
|
222 |
+
Mr. Edward Martin was approached by People Management for a preliminary conversation to understand the situation.
|
223 |
+
Mrs. Jane Anderson, his reporting manager, was made aware of the concerns.
|
224 |
+
|
225 |
+
Recommendations:
|
226 |
+
It's crucial to have a private and supportive conversation with Mr. Martin to understand if there's an underlying issue.
|
227 |
+
Consider referring Edward to our Personal Assistance Program (PAP) for counseling or support.
|
228 |
+
It may be beneficial to organize a session on drug awareness and workplace safety for all employees.
|
229 |
+
It's of utmost importance to handle this situation with sensitivity and discretion, ensuring the wellbeing of Mr. Martin and maintaining the integrity of our workplace environment. This email serves as a formal documentation of the incident. We'll determine the subsequent course of action based on your guidance and the recommendations provided.
|
230 |
+
|
231 |
+
Looking forward to your direction on this matter.
|
232 |
+
'''
|
233 |
+
|
234 |
+
|
235 |
+
# Entities map
|
236 |
+
|
237 |
+
entities_map = '''
|
238 |
+
{
|
239 |
+
"Mr. Benjamin Mitchell": "Mr. Edward Martin",
|
240 |
+
"Mrs. Alice Williams": "Mrs. Charlotte Johnson",
|
241 |
+
"January 15, 2020": "January 15, 2020",
|
242 |
+
"Mrs. Jane Fitzgerald": "Mrs. Jane Anderson",
|
243 |
+
"October 25, 2023": "October 25, 2023",
|
244 |
+
"4th Floor": "topmost floor",
|
245 |
+
"11:45 AM": "midday",
|
246 |
+
"Emily Clark": "Marie Foster",
|
247 |
+
"Employee Assistance Program (EAP)": "Personal Assistance Program (PAP)",
|
248 |
+
"Robert Taylor": "Benjamin Adams",
|
249 |
+
}
|
250 |
+
'''
|
251 |
+
|
252 |
+
|
253 |
+
|
254 |
+
# Step 1: Processing anonymized text with GPT-4
|
255 |
+
|
256 |
+
import openai
|
257 |
+
openai.api_key = f"<API_KEY>"
|
258 |
+
|
259 |
+
completion = openai.ChatCompletion.create(
|
260 |
+
model="gpt-4",
|
261 |
+
messages=[
|
262 |
+
{"role": "user", "content": f"Write an official warning letter to the employee, that we do not tolerate it and with the next incident he gets fired in the name of Mrs. Charlotte Johnson Human Resources Manager. Here is the report with information. {anonymized_info}"}
|
263 |
+
]
|
264 |
+
)
|
265 |
+
|
266 |
+
print(completion.choices[0].message.content)
|
267 |
+
#output
|
268 |
+
'''
|
269 |
+
Subject: Official Warning – Substance Misuse Policy Violation
|
270 |
+
|
271 |
+
Dear Mr. Edward Martin,
|
272 |
+
|
273 |
+
We hope this letter finds you well. This letter serves as an official warning in regards to an incident that occurred on October 25, 2023, in which signs of drug misuse were reported. This alleged conduct is highly alarming and contrary to our company’s policies and guidelines.
|
274 |
+
|
275 |
+
The incident detailed allegations of unusual behavior indicative of possible substance abuse. Colleagues reported that you appeared disoriented and were found in the restroom for an extended period. Additionally, unidentified pills were discovered near your workspace.
|
276 |
+
|
277 |
+
Our company is committed to providing a safe and non-detrimental work environment for all its workforce. This commitment is compromised when any type of drug misuse occurs. We draw your attention to our Employee Handbook, specifically 'Section 5: Substance Misuse', that states any illegal drug use, substance misuse or distribution thereof is strictly prohibited and could be subject to severe disciplinary action, including termination.
|
278 |
+
|
279 |
+
This is an official warning that such behavior misaligns with our workplace norms and cannot be tolerated. Another incident like this or similar breach of company guidelines may lead to further disciplinary action, up to and including termination of employment.
|
280 |
+
|
281 |
+
Please note that this is not an assumption of your guilt but an assertion to remain vigilant against any practices that could harm you or the workplace environment. We encourage you to utilize our Personal Assistance Program (PAP) as a tool for counseling and support, if needed.
|
282 |
+
|
283 |
+
We believe in your potential to rectify this situation and to maintain the high standards we are all accustomed to in our organization.
|
284 |
+
|
285 |
+
Should you need assistance or if you wish to discuss this matter further, please feel free to reach out to me. We appreciate your immediate attention to this important issue.
|
286 |
+
|
287 |
+
Yours sincerely,
|
288 |
+
|
289 |
+
Mrs. Charlotte Johnson
|
290 |
+
Human Resources Manager
|
291 |
+
'''
|
292 |
+
|
293 |
+
|
294 |
+
# Step 2: Replace the entities back in processed by GPT-4 text.
|
295 |
|
296 |
import ast
|
297 |
|
|
|
307 |
|
308 |
return swapped_str
|
309 |
|
|
|
310 |
|
311 |
+
gpt_response = completion.choices[0].message.content
|
312 |
+
entities_map = swap_keys_and_values_in_string(entities_map)
|
313 |
+
prompt = f"USER: Rephrase with {entities_map}: {gpt_response}\n\nASSISTANT:"
|
314 |
+
inputs = tokenizer(prompt, return_tensors='pt').to('cuda:0')
|
315 |
+
outputs = model.generate(inputs.input_ids, max_new_tokens=2048)
|
316 |
+
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
317 |
|
318 |
+
#output:
|
319 |
+
'''
|
320 |
+
Subject: Official Warning – Substance Misuse Policy Violation
|
321 |
|
322 |
+
Dear Mr. Benjamin Mitchell,
|
323 |
|
324 |
+
We hope this letter finds you well. This letter serves as an official warning in regards to an incident that occurred on January 15,
|
|
|
325 |
|
326 |
+
2020, in which signs of drug misuse were reported. This alleged conduct is highly alarming and contrary to our company’s policies and guidelines.
|
327 |
|
328 |
+
The incident detailed allegations of unusual behavior indicative of possible substance abuse. Colleagues reported that you appeared disoriented and
|
329 |
|
330 |
+
were found in the restroom for an extended period. Additionally, unidentified pills were discovered near your workspace.
|
331 |
|
332 |
+
Our company is committed to providing a safe and non-detrimental work environment for all its workforce. This commitment is compromised when any
|
333 |
|
334 |
+
type of drug misuse occurs. We draw your attention to our Employee Handbook, specifically 'Section 5: Substance Misuse', that states any illegal
|
335 |
|
336 |
+
drug use, substance misuse or distribution thereof is strictly prohibited and could be subject to severe disciplinary action, including termination.
|
337 |
|
338 |
+
This is an official warning that such behavior misaligns with our workplace norms and cannot be tolerated. Another incident like this or similar breach
|
339 |
|
340 |
+
of company guidelines may lead to further disciplinary action, up to and including termination of employment.
|
|
|
|
|
|
|
341 |
|
342 |
+
Please note that this is not an assumption of your guilt but an assertion to remain vigilant against any practices that could harm you or the workplace
|
343 |
|
344 |
+
environment. We encourage you to utilize our Employee Assistance Program (EAP) as a tool for counseling and support, if needed.
|
345 |
|
346 |
+
We believe in your potential to rectify this situation and to maintain the high standards we are all accustomed to in our organization.
|
|
|
|
|
|
|
347 |
|
348 |
+
Should you need assistance or if you wish to discuss this matter further, please feel free to reach out to me. We appreciate your immediate attention
|
349 |
|
350 |
+
to this important issue.
|
351 |
|
352 |
+
Yours sincerely,
|
|
|
|
|
|
|
353 |
|
354 |
+
Mrs. Alice Williams,
|
355 |
|
356 |
+
Human Resources Manager.
|
357 |
'''
|
358 |
+
|
|
|
|
|
|
|
|
|
359 |
```
|
360 |
+
|
361 |
…
|
362 |
# Dataset and Training Documentation for Audit
|
363 |
If you require the original dataset used for training this model, or further documentation related to its training and architecture for audit purposes, you can request this information by contacting us.
|