File size: 2,280 Bytes
0f1c29e
693f79f
 
 
 
 
 
c6d9b3b
693f79f
 
 
c6d9b3b
693f79f
0f1c29e
693f79f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: apache-2.0
language:
- en
tags:
- Token Classification
widget:
- text: "Monitored Natural Attenuation and, if necessary as a contingency, In Situ Chemical Oxidation to address the injection of a strong chemical oxidant to chemically treat the before the contingency can be implemented at the spill site."
  example_title: "example 1"
- text: "Site was identified as a potential source of groundwater contamination after the City performed Assessments were investigated further for potential contamination."
  example_title: "example 2"
- text: "Chromium releases from the UST is probably a major contributor to groundwater contamination in this area."
  example_title: "example 3"
---
## About the Model
An Environmental Named Entity Recognition model, trained on dataset from USEPA to recognize environmental due diligence (7 entities) from a given text corpus (remediation reports, record of decision, 5 year record etc). This model was built on top of distilbert-base-uncased

- Dataset: https://data.mendeley.com/datasets/tx6vmd4g9p/4
- Dataset Reasearch Paper: https://doi.org/10.1016/j.dib.2022.108579

## Usage
The easiest way is to load the inference api from huggingface and second method is through the pipeline object offered by transformers library.
```python

# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="d4data/EnviDueDiligence_NER")

# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("d4data/EnviDueDiligence_NER")
model = AutoModelForTokenClassification.from_pretrained("d4data/EnviDueDiligence_NER")

```

## Author
This model is part of the Research topic "Environmental Due Diligence" conducted by Deepak John Reji, Afreen Aman. If you use this work (code, model or dataset), please cite:
> Aman, A. and Reji, D.J., 2022. EnvBert: An NLP model for Environmental Due Diligence data classification. Software Impacts, 14, p.100427.

## You can support me here :)
<a href="https://www.buymeacoffee.com/deepakjohnreji" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>