nazneen commited on
Commit
3ef3dac
1 Parent(s): bb72180

model documentation

Browse files
Files changed (1) hide show
  1. README.md +433 -54
README.md CHANGED
@@ -1,56 +1,435 @@
1
- ---
2
- language:
3
- - en
4
- tags:
5
- - text-classification
6
- - bert
7
- - pytorch
8
- license: apache-2.0
9
- widget:
10
- - text: "In fiscal year 2019, we reduced our comprehensive carbon footprint for the fourth consecutive year—down 35 percent compared to 2015, when Apple’s carbon emissions peaked, even as net revenue increased by 11 percent over that same period. In the past year, we avoided over 10 million metric tons from our emissions reduction initiatives—like our Supplier Clean Energy Program, which lowered our footprint by 4.4 million metric tons. "
11
- example_title: "Carbon Footprint"
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- # ESG BERT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- (Uploaded from https://github.com/mukut03/ESG-BERT)
17
-
18
- **Domain Specific BERT Model for Text Mining in Sustainable Investing**
19
-
20
- Read more about this pre-trained model [here.](https://towardsdatascience.com/nlp-meets-sustainable-investing-d0542b3c264b?source=friends_link&sk=1f7e6641c3378aaff319a81decf387bf)
21
-
22
- **In collaboration with [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/)**
23
-
24
-
25
- ### Labels
26
-
27
- 0: Business_Ethics
28
- 1: Data_Security
29
- 2: Access_And_Affordability
30
- 3: Business_Model_Resilience
31
- 4: Competitive_Behavior
32
- 5: Critical_Incident_Risk_Management
33
- 6: Customer_Welfare
34
- 7: Director_Removal
35
- 8: Employee_Engagement_Inclusion_And_Diversity
36
- 9: Employee_Health_And_Safety
37
- 10: Human_Rights_And_Community_Relations
38
- 11: Labor_Practices
39
- 12: Management_Of_Legal_And_Regulatory_Framework
40
- 13: Physical_Impacts_Of_Climate_Change
41
- 14: Product_Quality_And_Safety
42
- 15: Product_Design_And_Lifecycle_Management
43
- 16: Selling_Practices_And_Product_Labeling
44
- 17: Supply_Chain_Management
45
- 18: Systemic_Risk_Management
46
- 19: Waste_And_Hazardous_Materials_Management
47
- 20: Water_And_Wastewater_Management
48
- 21: Air_Quality
49
- 22: Customer_Privacy
50
- 23: Ecological_Impacts
51
- 24: Energy_Management
52
- 25: GHG_Emissions
53
-
54
-
55
- ### References:
56
- [1] https://medium.com/analytics-vidhya/deploy-huggingface-s-bert-to-production-with-pytorch-serve-27b068026d18
 
1
+ # Model Card for ESG-BERT
2
+ Domain Specific BERT Model for Text Mining in Sustainable Investing
3
+
4
+
5
+
6
+
7
+ # Model Details
8
+
9
+ ## Model Description
10
+
11
+
12
+
13
+ - **Developed by:** [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/)
14
+ - **Shared by [Optional]:** HuggingFace
15
+ - **Model type:** Language model
16
+ - **Language(s) (NLP):** en
17
+ - **License:** More information needed
18
+ - **Related Models:**
19
+ - **Parent Model:** BERT
20
+ - **Resources for more information:**
21
+ - [GitHub Repo](https://github.com/mukut03/ESG-BERT)
22
+ - [Blog Post](https://towardsdatascience.com/nlp-meets-sustainable-investing-d0542b3c264b?source=friends_link&sk=1f7e6641c3378aaff319a81decf387bf)
23
+
24
+ # Uses
25
+
26
+
27
+ ## Direct Use
28
+
29
+ Text Mining in Sustainable Investing
30
+
31
+ ## Downstream Use [Optional]
32
+
33
+ The applications of ESG-BERT can be expanded way beyond just text classification. It can be fine-tuned to perform various other downstream NLP tasks in the domain of Sustainable Investing.
34
+
35
+ ## Out-of-Scope Use
36
+
37
+ The model should not be used to intentionally create hostile or alienating environments for people.
38
+ # Bias, Risks, and Limitations
39
+
40
+
41
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
42
+
43
+
44
+ ## Recommendations
45
+
46
+
47
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recomendations.
48
+
49
+
50
+ # Training Details
51
+
52
+ ## Training Data
53
+
54
+ More information needed
55
+
56
+ ## Training Procedure
57
+
58
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
59
+
60
+ ### Preprocessing
61
+
62
+ More information needed
63
+
64
+ ### Speeds, Sizes, Times
65
+
66
+ More information needed
67
+
68
+ # Evaluation
69
+
70
+
71
+
72
+ ## Testing Data, Factors & Metrics
73
+
74
+ ### Testing Data
75
+
76
+ The fine-tuned model for text classification is also available [here](https://drive.google.com/drive/folders/1Qz4HP3xkjLfJ6DGCFNeJ7GmcPq65_HVe?usp=sharing). It can be used directly to make predictions using just a few steps. First, download the fine-tuned pytorch_model.bin, config.json, and vocab.txt
77
+
78
+ ### Factors
79
+
80
+ More information needed
81
+
82
+ ### Metrics
83
+
84
+ More information needed
85
+
86
+ ## Results
87
+
88
+ ESG-BERT was further trained on unstructured text data with accuracies of 100% and 98% for Next Sentence Prediction and Masked Language Modelling tasks. Fine-tuning ESG-BERT for text classification yielded an F-1 score of 0.90. For comparison, the general BERT (BERT-base) model scored 0.79 after fine-tuning, and the sci-kit learn approach scored 0.67.
89
+
90
+ # Model Examination
91
+
92
+ More information needed
93
+
94
+ # Environmental Impact
95
+
96
+
97
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
98
+
99
+ - **Hardware Type:** More information needed
100
+ - **Hours used:** More information needed
101
+ - **Cloud Provider:** information needed
102
+ - **Compute Region:** More information needed
103
+ - **Carbon Emitted:** More information needed
104
+
105
+ # Technical Specifications [optional]
106
+
107
+ ## Model Architecture and Objective
108
+
109
+ More information needed
110
+
111
+ ## Compute Infrastructure
112
+
113
+ More information needed
114
+
115
+ ### Hardware
116
+
117
+ More information needed
118
+
119
+ ### Software
120
+
121
+ JDK 11 is needed to serve the model
122
+
123
+ # Citation
124
+
125
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
126
+
127
+ **BibTeX:**
128
+
129
+ More information needed
130
+
131
+ **APA:**
132
+
133
+ More information needed
134
+
135
+ # Glossary [optional]
136
+
137
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
138
+
139
+ More information needed
140
+
141
+ # More Information [optional]
142
+
143
+ More information needed
144
+
145
+ # Model Card Authors [optional]
146
+ [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/), in collaboration with the Ezi Ozoani and the HuggingFace Team
147
+
148
+
149
+ # Model Card Contact
150
+
151
+ More information needed
152
+
153
+ # How to Get Started with the Model
154
+
155
+ Use the code below to get started with the model.
156
+
157
+ <details>
158
+ <summary> Click to expand </summary>
159
+ ```
160
+
161
+
162
+ pip install torchserve torch-model-archiver
163
+
164
+ pip install torchvision
165
+
166
+ pip install transformers
167
+
168
+ ```
169
+
170
+ Next up, we'll set up the handler script. It is a basic handler for text classification that can be improved upon. Save this script as "handler.py" in your directory. [1]
171
+
172
+ ```
173
+
174
+ from abc import ABC
175
+
176
+ import json
177
+
178
+ import logging
179
+
180
+ import os
181
+
182
+ import torch
183
+
184
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
185
+
186
+ from ts.torch_handler.base_handler import BaseHandler
187
+
188
+ logger = logging.getLogger(__name__)
189
+
190
+ class TransformersClassifierHandler(BaseHandler, ABC):
191
+
192
+ """
193
+
194
+ Transformers text classifier handler class. This handler takes a text (string) and
195
+
196
+ as input and returns the classification text based on the serialized transformers checkpoint.
197
+
198
+ """
199
+
200
+ def __init__(self):
201
+
202
+ super(TransformersClassifierHandler, self).__init__()
203
+
204
+ self.initialized = False
205
+
206
+ def initialize(self, ctx):
207
+
208
+ self.manifest = ctx.manifest
209
+
210
+ properties = ctx.system_properties
211
+
212
+ model_dir = properties.get("model_dir")
213
+
214
+ self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
215
+
216
+ # Read model serialize/pt file
217
+
218
+ self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
219
+
220
+ self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
221
+
222
+ self.model.to(self.device)
223
+
224
+ self.model.eval()
225
+
226
+ logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))
227
+
228
+ # Read the mapping file, index to object name
229
+
230
+ mapping_file_path = os.path.join(model_dir, "index_to_name.json")
231
+
232
+ if os.path.isfile(mapping_file_path):
233
+
234
+ with open(mapping_file_path) as f:
235
+
236
+ self.mapping = json.load(f)
237
+
238
+ else:
239
+
240
+ logger.warning('Missing the index_to_name.json file. Inference output will not include class name.')
241
+
242
+ self.initialized = True
243
+
244
+ def preprocess(self, data):
245
+
246
+ """ Very basic preprocessing code - only tokenizes.
247
+
248
+ Extend with your own preprocessing steps as needed.
249
+
250
+ """
251
+
252
+ text = data[0].get("data")
253
+
254
+ if text is None:
255
+
256
+ text = data[0].get("body")
257
+
258
+ sentences = text.decode('utf-8')
259
+
260
+ logger.info("Received text: '%s'", sentences)
261
+
262
+ inputs = self.tokenizer.encode_plus(
263
+
264
+ sentences,
265
+
266
+ add_special_tokens=True,
267
+
268
+ return_tensors="pt"
269
+
270
+ )
271
+
272
+ return inputs
273
+
274
+ def inference(self, inputs):
275
+
276
+ """
277
+
278
+ Predict the class of a text using a trained transformer model.
279
+
280
+ """
281
+
282
+ # NOTE: This makes the assumption that your model expects text to be tokenized
283
+
284
+ # with "input_ids" and "token_type_ids" - which is true for some popular transformer models, e.g. bert.
285
+
286
+ # If your transformer model expects different tokenization, adapt this code to suit
287
+
288
+ # its expected input format.
289
+
290
+ prediction = self.model(
291
+
292
+ inputs['input_ids'].to(self.device),
293
+
294
+ token_type_ids=inputs['token_type_ids'].to(self.device)
295
+
296
+ )[0].argmax().item()
297
+
298
+ logger.info("Model predicted: '%s'", prediction)
299
+
300
+ if self.mapping:
301
+
302
+ prediction = self.mapping[str(prediction)]
303
+
304
+ return [prediction]
305
+
306
+ def postprocess(self, inference_output):
307
+
308
+ # TODO: Add any needed post-processing of the model predictions here
309
+
310
+ return inference_output
311
+
312
+ _service = TransformersClassifierHandler()
313
+
314
+ def handle(data, context):
315
+
316
+ try:
317
+
318
+ if not _service.initialized:
319
+
320
+ _service.initialize(context)
321
+
322
+ if data is None:
323
+
324
+ return None
325
+
326
+ data = _service.preprocess(data)
327
+
328
+ data = _service.inference(data)
329
+
330
+ data = _service.postprocess(data)
331
+
332
+ return data
333
+
334
+ except Exception as e:
335
+
336
+ raise e
337
+
338
+
339
+
340
+ ```
341
+
342
+ TorcheServe uses a format called MAR (Model Archive). We can convert our PyTorch model to a .mar file using this command:
343
+
344
+ ```
345
+
346
+ torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json,./bert_model/vocab.txt" --handler "./handler.py"
347
+
348
+ ```
349
+
350
+ Move the .mar file into a new directory:
351
+
352
+ ```
353
+
354
+ mkdir model_store && mv bert.mar model_store
355
+
356
+ ```
357
+
358
+ Finally, we can start TorchServe using the command:
359
+
360
+ ```
361
+
362
+ torchserve --start --model-store model_store --models bert=bert.mar
363
+
364
+ ```
365
+
366
+ We can now query the model from another terminal window using the Inference API. We pass a text file containing text that the model will try to classify.
367
+
368
 
369
+
370
+
371
+ ```
372
+
373
+ curl -X POST http://127.0.0.1:8080/predictions/bert -T predict.txt
374
+
375
+ ```
376
+
377
+ This returns a label number which correlates to a textual label. This is stored in the label_dict.txt dictionary file.
378
+
379
+ ```
380
+
381
+ __label__Business_Ethics : 0
382
+
383
+ __label__Data_Security : 1
384
+
385
+ __label__Access_And_Affordability : 2
386
+
387
+ __label__Business_Model_Resilience : 3
388
+
389
+ __label__Competitive_Behavior : 4
390
+
391
+ __label__Critical_Incident_Risk_Management : 5
392
+
393
+ __label__Customer_Welfare : 6
394
+
395
+ __label__Director_Removal : 7
396
+
397
+ __label__Employee_Engagement_Inclusion_And_Diversity : 8
398
+
399
+ __label__Employee_Health_And_Safety : 9
400
+
401
+ __label__Human_Rights_And_Community_Relations : 10
402
+
403
+ __label__Labor_Practices : 11
404
+
405
+ __label__Management_Of_Legal_And_Regulatory_Framework : 12
406
+
407
+ __label__Physical_Impacts_Of_Climate_Change : 13
408
+
409
+ __label__Product_Quality_And_Safety : 14
410
+
411
+ __label__Product_Design_And_Lifecycle_Management : 15
412
+
413
+ __label__Selling_Practices_And_Product_Labeling : 16
414
+
415
+ __label__Supply_Chain_Management : 17
416
+
417
+ __label__Systemic_Risk_Management : 18
418
+
419
+ __label__Waste_And_Hazardous_Materials_Management : 19
420
+
421
+ __label__Water_And_Wastewater_Management : 20
422
+
423
+ __label__Air_Quality : 21
424
+
425
+ __label__Customer_Privacy : 22
426
+
427
+ __label__Ecological_Impacts : 23
428
+
429
+ __label__Energy_Management : 24
430
+
431
+ __label__GHG_Emissions : 25
432
+
433
+ ```
434
 
435
+ <\details>