MichaelWelsch commited on
Commit
d294f61
1 Parent(s): 1118108

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -21
README.md CHANGED
@@ -21,12 +21,18 @@ Bulgarian, Chinese, Czech, Dutch, English, Estonian, Finnish, French, German, Gr
21
  Introducing a cutting-edge model tailored to the task of extracting entities from sensitive text and anonymizing it. This model specializes in identifying and safeguarding confidential information, ensuring organizations' compliance with stringent data privacy regulations and minimizing the potential for inadvertent disclosure of classified data and trade secrets.
22
  # Example Usage
23
  ```python
 
 
 
 
24
  import torch
 
 
25
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
26
  tokenizer = AutoTokenizer.from_pretrained("metricspace/EntityAnonymization-3B-V0.9")
27
  model = AutoModelForCausalLM.from_pretrained("metricspace/EntityAnonymization-3B-V0.9", torch_dtype=torch.bfloat16)
28
-
29
- import re
30
 
31
  def extract_last_assistant_response(input_text):
32
  # Find the occurrence of "ASSISTANT:" in the input text
@@ -37,39 +43,132 @@ def extract_last_assistant_response(input_text):
37
  response = input_text[start_index:].strip()
38
  return response
39
 
 
 
40
 
41
- text_to_anonymize = '''Our organization manages a sophisticated data analytics platform ([login to view URL]) that highlights our cutting-edge data visualization techniques. In response to evolving business needs, we've recognized the imperative to optimize our data handling processes. As part of this initiative, we're seizing the opportunity to standardize the codebase for our web and mobile applications using a unified approach with Vue.js.
42
- We're currently seeking a talented developer to spearhead this transformation, ensuring a seamless separation between backend data processing and frontend presentation layers. The revised architecture will incorporate three critical APIs (Google Maps for location services, AccuWeather for weather data, and our in-house Analytica API for advanced analytics).
43
- The backend restructuring is a critical component, designed to serve as a showcase for the capabilities of our Analytica API. The frontend, both for the web and mobile interfaces, will maintain the current user experience using the existing design assets.
44
- We are actively searching for a Vue.js developer who can efficiently interpret our project vision and deliver an elegant, sustainable solution.'''
45
 
 
46
 
47
- prompt = f'USER: Resample the entities: {text_to_anonymize}\n\nASSISTANT:'
48
- inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
49
- output_entities = model.generate(inputs.input_ids, max_new_tokens=250, do_sample=False, top_k=50, top_p=0.98, num_beams=1)
50
- output_entities_text = tokenizer.decode(output_entities[0], skip_special_tokens=True)
51
 
52
- # extracting entities text from assistant response
53
- generated_part = extract_assistant_response(output_text_1)
 
 
 
54
 
55
- prompt_2 = f"USER: Rephrase with {generated_part}: {text_to_anonymize}\n\nASSISTANT:"
56
- inputs = tokenizer(prompt_2, return_tensors='pt').to('cuda')
57
- output_resampled = model.generate(inputs.input_ids, max_new_tokens=500, do_sample=False, top_k=50, top_p=0.98)
58
- output_resampled_text = tokenizer.decode(output_resampled[0], skip_special_tokens=True)
 
 
 
 
 
 
 
59
 
 
 
 
60
 
61
- print(output_resampled_text)
 
 
 
 
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  #output
64
  '''
65
- Our enterprise manages an advanced data analysis platform ([login to view URL]) that highlights our innovative data visualization methods. In response to evolving business needs, we've recognized the imperative to optimize our data handling processes. As part of this initiative, we're seizing the opportunity to standardize the codebase for our online and mobile applications using a unified approach with Vega.js.
66
- We're currently seeking a talented developer to spearhead this transformation, ensuring a seamless separation between backend data processing and frontend presentation layers. The revised architecture will incorporate three critical APIs (Maple Maps for location services, MeteorWeather for weather data, and our in-house Analytica API for advanced analytics).
67
- The backend restructuring is a critical component, designed to serve as a showcase for the capabilities of our Analytica API. The frontend, both for the web and mobile interfaces, will maintain the current user experience using the existing design assets.
68
- We are actively searching for a Vega.js developer who can efficiently interpret our project vision and deliver an elegant, sustainable solution
 
 
 
 
 
 
 
 
69
  '''
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ```
71
 
72
 
 
73
  # Example inverted usage
74
  ```python
75
  import torch
 
21
  Introducing a cutting-edge model tailored to the task of extracting entities from sensitive text and anonymizing it. This model specializes in identifying and safeguarding confidential information, ensuring organizations' compliance with stringent data privacy regulations and minimizing the potential for inadvertent disclosure of classified data and trade secrets.
22
  # Example Usage
23
  ```python
24
+ !pip install sentencepiece
25
+ !pip install transformers
26
+ ```
27
+ ```python
28
  import torch
29
+ import json
30
+ import re
31
  from transformers import AutoTokenizer, AutoModelForCausalLM
32
+
33
  tokenizer = AutoTokenizer.from_pretrained("metricspace/EntityAnonymization-3B-V0.9")
34
  model = AutoModelForCausalLM.from_pretrained("metricspace/EntityAnonymization-3B-V0.9", torch_dtype=torch.bfloat16)
35
+ model.to("cuda:0")
 
36
 
37
  def extract_last_assistant_response(input_text):
38
  # Find the occurrence of "ASSISTANT:" in the input text
 
43
  response = input_text[start_index:].strip()
44
  return response
45
 
46
+ # Input example
47
+ text_to_anonymize = '''Subject: HR Incident Report: Speculation of Drug Misuse by Mr. Benjamin Mitchell
48
 
49
+ Dear Mrs. Alice Williams,
 
 
 
50
 
51
+ I trust you're well. I wish to bring to your attention a concerning matter involving one of our esteemed employees, Mr. Benjamin Mitchell.
52
 
53
+ Employee Details:
 
 
 
54
 
55
+ Name: Benjamin Mitchell
56
+ Position: Senior Marketing Creative
57
+ Department: Marketing
58
+ Date of Joining: January 15, 2020
59
+ Reporting Manager: Mrs. Jane Fitzgerald
60
 
61
+ Incident Details:
62
+ Date: October 25, 2023
63
+ Location: Restroom, 4th Floor
64
+ Time: 11:45 AM
65
+
66
+ Description of Incident:
67
+ On the date specified, a few colleagues reported unusual behavior exhibited by Mr. Mitchell, which raised concerns about potential drug misuse. Witnesses mentioned that Benjamin appeared disoriented and was found in the restroom for an extended period. Some employees also discovered unidentified pills in close proximity to his chair.
68
+
69
+ Witness Accounts:
70
+ Ms. Emily Clark: "Benjamin seemed distracted and not his usual self today. He's been taking frequent breaks and appears a bit disoriented."
71
+ Mr. Robert Taylor: "I found some pills near his chair on the floor. It's concerning, and I felt it necessary to report."
72
 
73
+ Immediate Actions Taken:
74
+ Mr. Benjamin Mitchell was approached by HR for a preliminary conversation to understand the situation.
75
+ Mrs. Jane Fitzgerald, his reporting manager, was made aware of the concerns.
76
 
77
+ Recommendations:
78
+ It's crucial to have a private and supportive conversation with Mr. Mitchell to understand if there's an underlying issue.
79
+ Consider referring Benjamin to our Employee Assistance Program (EAP) for counseling or support.
80
+ It may be beneficial to organize a session on drug awareness and workplace safety for all employees.
81
+ It's of utmost importance to handle this situation with sensitivity and discretion, ensuring the wellbeing of Mr. Mitchell and maintaining the integrity of our workplace environment. This email serves as a formal documentation of the incident. We'll determine the subsequent course of action based on your guidance and the recommendations provided.
82
 
83
+ Looking forward to your direction on this matter.
84
+ '''
85
+ print(text_to_anonymize)
86
+
87
+ # Step 1: Extracting entities from text
88
+ prompt = f'USER: Resample the entities: {text_to_anonymize}\n\nASSISTANT:'
89
+ inputs = tokenizer(prompt, return_tensors='pt').to('cuda:0')
90
+ output_entities = model.generate(inputs.input_ids, max_new_tokens=250, do_sample=True, temperature=0.85)
91
+ raw_output_entities_text = tokenizer.decode(output_entities[0])
92
+ entities = extract_last_assistant_response(raw_output_entities_text)
93
+
94
+ print('-----------Entities----------------')
95
+ try:
96
+ entities = re.search(r"\{.*?\}", entities, re.DOTALL).group(0)
97
+ data_dict = eval(entities)
98
+ formatted_json = json.dumps(data_dict, indent=4)
99
+ print(formatted_json)
100
+ except:
101
+ #bad formated json
102
+ print(entities)
103
  #output
104
  '''
105
+ {
106
+ "Mr. Benjamin Mitchell": "Mr. Edward Martin",
107
+ "Mrs. Alice Williams": "Mrs. Charlotte Johnson",
108
+ "January 15, 2020": "January 15, 2020",
109
+ "Mrs. Jane Fitzgerald": "Mrs. Jane Anderson",
110
+ "October 25, 2023": "October 25, 2023",
111
+ "4th Floor": "topmost floor",
112
+ "11:45 AM": "midday",
113
+ "Emily Clark": "Marie Foster",
114
+ "Employee Assistance Program (EAP)": "Personal Assistance Program (PAP)",
115
+ "Robert Taylor": "Benjamin Adams",
116
+ }
117
  '''
118
+
119
+ # Step 2: Use entities to resample the original text
120
+ prompt_2 = f"USER: Rephrase with {entities}: {text_to_anonymize}\n\nASSISTANT:"
121
+ inputs = tokenizer(prompt_2, return_tensors='pt').to('cuda:0')
122
+ output_resampled = model.generate(inputs.input_ids, max_length=2048)
123
+ raw_output_resampled_text = tokenizer.decode(output_resampled[0])
124
+ resampled_text = extract_last_assistant_response(raw_output_resampled_text)
125
+ print('---------Anonymized Version--------')
126
+ print(resampled_text)
127
+ #output:
128
+ '''
129
+ Subject: HR Incident Report: Speculation of Drug Misuse by Mr. Edward Martin
130
+
131
+ Dear Mrs. Charlotte Johnson,
132
+
133
+ I trust you're well. I wish to bring to your attention a concerning matter involving one of our esteemed employees, Mr. Edward Martin.
134
+
135
+ Employee Details:
136
+
137
+ Name: Edward Martin
138
+ Position: Senior Marketing Creative
139
+ Department: Marketing
140
+ Date of Joining: January 15, 2020
141
+ Reporting Manager: Mrs. Jane Anderson
142
+
143
+ Incident Details:
144
+ Date: October 25, 2023
145
+ Location: Restroom, topmost floor
146
+ Time: midday
147
+
148
+ Description of Incident:
149
+ On the date specified, a few colleagues reported unusual behavior exhibited by Mr. Martin, which raised concerns about potential drug misuse. Witnesses mentioned that Edward appeared disoriented and was found in the restroom for an extended period. Some employees also discovered unidentified pills in close proximity to his chair.
150
+
151
+ Witness Accounts:
152
+ Ms. Marie Foster: "Edward seemed distracted and not his usual self today. He's been taking frequent breaks and appears a bit disoriented."
153
+ Mr. Benjamin Adams: "I found some pills near his chair on the floor. It's concerning, and I felt it necessary to report."
154
+
155
+ Immediate Actions Taken:
156
+ Mr. Edward Martin was approached by People Management for a preliminary conversation to understand the situation.
157
+ Mrs. Jane Anderson, his reporting manager, was made aware of the concerns.
158
+
159
+ Recommendations:
160
+ It's crucial to have a private and supportive conversation with Mr. Martin to understand if there's an underlying issue.
161
+ Consider referring Edward to our Personal Assistance Program (PAP) for counseling or support.
162
+ It may be beneficial to organize a session on drug awareness and workplace safety for all employees.
163
+ It's of utmost importance to handle this situation with sensitivity and discretion, ensuring the wellbeing of Mr. Martin and maintaining the integrity of our workplace environment. This email serves as a formal documentation of the incident. We'll determine the subsequent course of action based on your guidance and the recommendations provided.
164
+
165
+ Looking forward to your direction on this matter.
166
+ '''
167
+
168
  ```
169
 
170
 
171
+
172
  # Example inverted usage
173
  ```python
174
  import torch