Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,42 @@
|
|
1 |
---
|
2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- Token Classification
|
7 |
+
widget:
|
8 |
+
- text: "Monitored Natural Attenuation (MNA) and, if necessary as a contingency, In Situ Chemical Oxidation (ISCO) to address ISCO involves the injection of a strong chemical oxidant to chemically treat the before the ISCO contingency can be implemented at the spill site."
|
9 |
+
example_title: "example 1"
|
10 |
+
- text: "Site was identified as a potential source of groundwater contamination after the City performed Assessments were investigated further for potential contamination."
|
11 |
+
example_title: "example 2"
|
12 |
+
- text: "TCE releases from the UST is probably a major contributor to groundwater contamination in this area."
|
13 |
+
example_title: "example 3"
|
14 |
---
|
15 |
+
## About the Model
|
16 |
+
An Environmental Named Entity Recognition model, trained on dataset from USEPA to recognize environmental due diligence (7 entities) from a given text corpus (remediation reports, record of decision, 5 year record etc). This model was built on top of distilbert-base-uncased
|
17 |
+
|
18 |
+
- Dataset: https://data.mendeley.com/datasets/tx6vmd4g9p/4
|
19 |
+
- Dataset Reasearch Paper: https://doi.org/10.1016/j.dib.2022.108579
|
20 |
+
|
21 |
+
## Usage
|
22 |
+
The easiest way is to load the inference api from huggingface and second method is through the pipeline object offered by transformers library.
|
23 |
+
```python
|
24 |
+
|
25 |
+
# Use a pipeline as a high-level helper
|
26 |
+
from transformers import pipeline
|
27 |
+
pipe = pipeline("token-classification", model="d4data/EnviDueDiligence_NER")
|
28 |
+
|
29 |
+
# Load model directly
|
30 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
31 |
+
tokenizer = AutoTokenizer.from_pretrained("d4data/EnviDueDiligence_NER")
|
32 |
+
model = AutoModelForTokenClassification.from_pretrained("d4data/EnviDueDiligence_NER")
|
33 |
+
|
34 |
+
```
|
35 |
+
|
36 |
+
## Author
|
37 |
+
This model is part of the Research topic "Environmental Due Diligence" conducted by Deepak John Reji, Afreen Aman. If you use this work (code, model or dataset), please cite:
|
38 |
+
> Aman, A. and Reji, D.J., 2022. EnvBert: An NLP model for Environmental Due Diligence data classification. Software Impacts, 14, p.100427.
|
39 |
+
|
40 |
+
## You can support me here :)
|
41 |
+
<a href="https://www.buymeacoffee.com/deepakjohnreji" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
|
42 |
+
|