Wachu2005 commited on
Commit
dd6f5ec
1 Parent(s): a9e23a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md CHANGED
@@ -68,6 +68,93 @@ The backend restructuring is a critical component, designed to serve as a showca
68
  We are actively searching for a Vega.js developer who can efficiently interpret our project vision and deliver an elegant, sustainable solution
69
  '''
70
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  # Dataset and Training Documentation for Audit
73
  If you require the original dataset used for training this model, or further documentation related to its training and architecture for audit purposes, you can request this information by contacting us.
 
68
  We are actively searching for a Vega.js developer who can efficiently interpret our project vision and deliver an elegant, sustainable solution
69
  '''
70
  ```
71
+
72
+
73
+ # Example inverted usage
74
+ ```python
75
+ import torch
76
+ from transformers import AutoTokenizer, AutoModelForCausalLM
77
+ tokenizer = AutoTokenizer.from_pretrained("metricspace/EntityAnonymization-3B-V0.9")
78
+ model = AutoModelForCausalLM.from_pretrained("metricspace/EntityAnonymization-3B-V0.9", torch_dtype=torch.bfloat16)
79
+
80
+ import re
81
+
82
+ def extract_last_assistant_response(input_text):
83
+ # Find the occurrence of "ASSISTANT:" in the input text
84
+ match = re.search(r'ASSISTANT:', input_text)
85
+
86
+ # Get the index where the last "ASSISTANT:" ends
87
+ start_index = match.end()
88
+ response = input_text[start_index:].strip()
89
+ return response
90
+
91
+ import ast
92
+
93
+ def swap_keys_and_values_in_string(input_str):
94
+ # Convert the input string to a dictionary
95
+ input_dict = ast.literal_eval(input_str)
96
+
97
+ # Swap the keys and values
98
+ swapped_dict = {v: k for k, v in input_dict.items()}
99
+
100
+ # Convert the swapped dictionary back to a string
101
+ swapped_str = str(swapped_dict)
102
+
103
+ return swapped_str
104
+
105
+ # sample text for entitity extraction and resampling
106
+
107
+ original_text = '''Our organization, XYZ Biotech, operates at the forefront of groundbreaking pharmaceutical research, renowned for our pioneering drug development and breakthrough treatments. In light of the ever-evolving regulatory landscape and the need to safeguard our research endeavors, we've recognized the critical importance of enhancing our compliance and data security protocols. To this end, we are on the lookout for a top-notch regulatory affairs specialist to spearhead this transformation, ensuring the rigorous adherence to industry standards and the protection of our confidential research data.
108
+
109
+ This comprehensive initiative encompasses not only ensuring regulatory compliance but also the implementation of three vital security measures. We will be utilizing CipherGuard's state-of-the-art encryption technology to secure our research data, deploying BioShield's advanced security protocols for laboratory access, and integrating SecureLabs' real-time data monitoring and threat detection systems.
110
+
111
+ The enhancement of our regulatory affairs and data security measures is a critical component in safeguarding our proprietary research, reinforcing our commitment to drug development excellence. While we prioritize compliance and data protection, the user experience for our research teams and partners will remain user-friendly and efficient, whether they are using our proprietary research software, "BioDiscover," or our mobile applications.
112
+
113
+ We are actively in search of a regulatory affairs specialist who can comprehend the importance of maintaining compliance and data security in our industry and who can deliver a comprehensive, airtight solution that not only ensures our adherence to regulations but also safeguards the confidential nature of our research at XYZ Biotech.''',
114
+ '''
115
+
116
+
117
+ # another different anonymized text with replaced entitie
118
+
119
+ anonymized_text = '''ABC Pharmaceuticals, a renowned player in the pharmaceutical industry, is dedicated to pioneering drug development and breakthrough treatments. In response to the ever-evolving regulatory landscape and the need to protect our research initiatives, we have identified the paramount importance of enhancing our compliance and data security protocols. As a part of this strategic shift, we are actively searching for a top-tier regulatory affairs specialist to lead this transformation, ensuring unwavering adherence to industry standards and the safeguarding of our confidential research data.
120
+
121
+ This comprehensive initiative goes beyond regulatory compliance and entails the implementation of three crucial security measures. We will be leveraging the cutting-edge encryption technology provided by CodeGuard to secure our research data, implementing BioProtect's advanced security protocols for laboratory access, and integrating the real-time data monitoring and threat detection systems offered by SecureTech.
122
+
123
+ The enhancement of our regulatory affairs and data security measures is a pivotal component in safeguarding our proprietary research, reinforcing our commitment to excellence in drug development. While we prioritize compliance and data protection, the user experience for our research teams and partners will remain user-friendly and efficient, whether they are using our proprietary research software, "BioDiscover," or our mobile applications.
124
+
125
+ We are actively seeking a regulatory affairs specialist who comprehends the critical importance of upholding compliance and data security in our industry and possesses the expertise to deliver a comprehensive and impervious solution that ensures not only our adherence to regulations but also preserves the confidentiality of our research data at ABC Pharmaceuticals.'''
126
+
127
+
128
+ prompt = f'USER: Resample the entities: {original_text}\n\nASSISTANT:'
129
+ inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
130
+ outputs = model.generate(inputs.input_ids, max_new_tokens=250, do_sample=False, top_k=50, top_p=0.98, num_beams=1)
131
+ output_text_1 = tokenizer.decode(outputs[0], skip_special_tokens=True)
132
+
133
+
134
+ generated_part = extract_assistant_response(output_text_1)
135
+
136
+ # inverting the entity map
137
+ # {'XYZ Biotech': 'ABC Pharmaceuticals', 'CipherGuard': 'CodeGuard', 'BioShield': 'BioProtect', 'SecureLabs': 'SecureTech', 'BioDiscover': 'BioDiscover'}
138
+ # inverted to this:
139
+ # {'ABC Pharmaceuticals': 'XYZ Biotech', 'CodeGuard': 'CipherGuard', 'BioProtect': 'BioShield', 'SecureTech': 'SecureLabs', 'BioDiscover': 'BioDiscover'}
140
+
141
+ inverted_entities = swap_keys_and_values_in_string(generated_part)
142
+
143
+
144
+ prompt_2 = f"USER: Rephrase with {inverted_entities}: {anonymized_text}\n\nASSISTANT:"
145
+ inputs = tokenizer(prompt_2, return_tensors='pt').to('cuda')
146
+ outputs = model.generate(inputs.input_ids, max_new_tokens=500, do_sample=False, top_k=50, top_p=0.98)
147
+ output_text_2 = tokenizer.decode(outputs[0], skip_special_tokens=True)
148
+
149
+ print(output_text_2)
150
+
151
+ '''
152
+ XYZ Biotech, a renowned player in the biotech industry, is dedicated to pioneering drug development and breakthrough treatments. In response to the ever-evolving regulatory landscape and the need to protect our research initiatives, we have identified the paramount importance of enhancing our compliance and data security protocols. As a part of this strategic shift, we are actively searching for a top-tier regulatory affairs specialist to lead this transformation, ensuring unwavering adherence to industry standards and the safeguarding of our confidential research data.
153
+ This comprehensive initiative goes beyond regulatory compliance and entails the implementation of three crucial security measures. We will be leveraging the cutting-edge encryption technology provided by CipherGuard to secure our research data, implementing BioShield's advanced security protocols for laboratory access, and integrating the real-time data monitoring and threat detection systems offered by SecureLabs.
154
+ The enhancement of our regulatory affairs and data security measures is a pivotal component in safeguarding our proprietary research, reinforcing our commitment to excellence in drug development. While we prioritize compliance and data protection, the user experience for our research teams and partners will remain user-friendly and efficient, whether they are using our proprietary research software, "BioDiscover," or our mobile applications.
155
+ We are actively seeking a regulatory affairs specialist who comprehends the critical importance of upholding compliance and data security in our industry and possesses the expertise to deliver a comprehensive and impervious solution that ensures not only our adherence to regulations but also preserves the confidentiality of our research data at XYZ Biotech.
156
+ '''
157
+ ```
158
 
159
  # Dataset and Training Documentation for Audit
160
  If you require the original dataset used for training this model, or further documentation related to its training and architecture for audit purposes, you can request this information by contacting us.