Upload model
Browse files- README.md +39 -43
- pytorch_model.bin +1 -1
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
---
|
2 |
language:
|
3 |
- en
|
|
|
4 |
library_name: span-marker
|
5 |
tags:
|
6 |
- span-marker
|
@@ -15,55 +16,55 @@ metrics:
|
|
15 |
- recall
|
16 |
- f1
|
17 |
widget:
|
18 |
-
- text:
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
- text:
|
28 |
-
|
29 |
-
- text:
|
30 |
-
|
31 |
pipeline_tag: token-classification
|
32 |
co2_eq_emissions:
|
33 |
-
emissions:
|
34 |
source: codecarbon
|
35 |
training_type: fine-tuning
|
36 |
on_cloud: false
|
37 |
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
|
38 |
ram_total_size: 31.777088165283203
|
39 |
-
hours_used:
|
40 |
hardware_used: 1 x NVIDIA GeForce RTX 3090
|
41 |
base_model: bert-base-cased
|
42 |
model-index:
|
43 |
-
- name: SpanMarker with bert-base-cased on FewNERD, CoNLL2003, OntoNotes v5
|
44 |
results:
|
45 |
- task:
|
46 |
type: token-classification
|
47 |
name: Named Entity Recognition
|
48 |
dataset:
|
49 |
-
name: FewNERD, CoNLL2003, OntoNotes v5
|
50 |
type: tomaarsen/ner-orgs
|
51 |
split: test
|
52 |
metrics:
|
53 |
- type: f1
|
54 |
-
value: 0.
|
55 |
name: F1
|
56 |
- type: precision
|
57 |
-
value: 0.
|
58 |
name: Precision
|
59 |
- type: recall
|
60 |
-
value: 0.
|
61 |
name: Recall
|
62 |
---
|
63 |
|
64 |
-
# SpanMarker with bert-base-cased on FewNERD, CoNLL2003, OntoNotes v5
|
65 |
|
66 |
-
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD, CoNLL2003, OntoNotes v5
|
67 |
|
68 |
## Model Details
|
69 |
|
@@ -72,9 +73,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained
|
|
72 |
- **Encoder:** [bert-base-cased](https://huggingface.co/bert-base-cased)
|
73 |
- **Maximum Sequence Length:** 256 tokens
|
74 |
- **Maximum Entity Length:** 8 words
|
75 |
-
- **Training Dataset:** [FewNERD, CoNLL2003, OntoNotes v5
|
76 |
- **Language:** en
|
77 |
-
|
78 |
|
79 |
### Model Sources
|
80 |
|
@@ -84,15 +85,15 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained
|
|
84 |
### Model Labels
|
85 |
| Label | Examples |
|
86 |
|:------|:---------------------------------------------|
|
87 |
-
| ORG | "IAEA", "Church 's Chicken"
|
88 |
|
89 |
## Evaluation
|
90 |
|
91 |
### Metrics
|
92 |
| Label | Precision | Recall | F1 |
|
93 |
|:--------|:----------|:-------|:-------|
|
94 |
-
| **all** | 0.
|
95 |
-
| ORG | 0.
|
96 |
|
97 |
## Uses
|
98 |
|
@@ -104,7 +105,7 @@ from span_marker import SpanMarkerModel
|
|
104 |
# Download from the 🤗 Hub
|
105 |
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-orgs")
|
106 |
# Run inference
|
107 |
-
entities = model.predict("
|
108 |
```
|
109 |
|
110 |
### Downstream Use
|
@@ -155,8 +156,8 @@ trainer.save_model("tomaarsen/span-marker-bert-base-orgs-finetuned")
|
|
155 |
### Training Set Metrics
|
156 |
| Training set | Min | Median | Max |
|
157 |
|:----------------------|:----|:--------|:----|
|
158 |
-
| Sentence length | 1 |
|
159 |
-
| Entities per sentence | 0 | 0.
|
160 |
|
161 |
### Training Hyperparameters
|
162 |
- learning_rate: 5e-05
|
@@ -169,22 +170,17 @@ trainer.save_model("tomaarsen/span-marker-bert-base-orgs-finetuned")
|
|
169 |
- num_epochs: 3
|
170 |
|
171 |
### Training Results
|
172 |
-
| Epoch | Step | Validation Loss |
|
173 |
-
|
174 |
-
| 0.
|
175 |
-
|
|
176 |
-
|
|
177 |
-
|
|
178 |
-
| 1.6365 | 15000 | 0.0045 |
|
179 |
-
| 1.9638 | 18000 | 0.0046 |
|
180 |
-
| 2.2911 | 21000 | 0.0054 |
|
181 |
-
| 2.6184 | 24000 | 0.0053 |
|
182 |
-
| 2.9457 | 27000 | 0.0052 |
|
183 |
|
184 |
### Environmental Impact
|
185 |
Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
|
186 |
-
- **Carbon Emitted**: 0.
|
187 |
-
- **Hours Used**:
|
188 |
|
189 |
### Training Hardware
|
190 |
- **On Cloud**: No
|
|
|
1 |
---
|
2 |
language:
|
3 |
- en
|
4 |
+
license: cc-by-sa-4.0
|
5 |
library_name: span-marker
|
6 |
tags:
|
7 |
- span-marker
|
|
|
16 |
- recall
|
17 |
- f1
|
18 |
widget:
|
19 |
+
- text: Today in Zhongnanhai, General Secretary of the Communist Party of China, President
|
20 |
+
of the country and honorary President of China's Red Cross, Zemin Jiang met with
|
21 |
+
representatives of the 6th National Member Congress of China's Red Cross, and
|
22 |
+
expressed warm greetings to the 20 million hardworking members on behalf of the
|
23 |
+
Central Committee of the Chinese Communist Party and State Council.
|
24 |
+
- text: On April 20, 2017, MGM Television Studios, headed by Mark Burnett formed a
|
25 |
+
partnership with McLane and Buss to produce and distribute new content across
|
26 |
+
a number of media platforms.
|
27 |
+
- text: 'Postponed: East Fife v Clydebank, St Johnstone v'
|
28 |
+
- text: Prime contractor was Hughes Aircraft Company Electronics Division which developed
|
29 |
+
the Tiamat with the assistance of the NACA.
|
30 |
+
- text: After graduating from Auburn University with a degree in Engineering in 1985,
|
31 |
+
he went on to play inside linebacker for the Pittsburgh Steelers for four seasons.
|
32 |
pipeline_tag: token-classification
|
33 |
co2_eq_emissions:
|
34 |
+
emissions: 248.1008753496152
|
35 |
source: codecarbon
|
36 |
training_type: fine-tuning
|
37 |
on_cloud: false
|
38 |
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
|
39 |
ram_total_size: 31.777088165283203
|
40 |
+
hours_used: 1.766
|
41 |
hardware_used: 1 x NVIDIA GeForce RTX 3090
|
42 |
base_model: bert-base-cased
|
43 |
model-index:
|
44 |
+
- name: SpanMarker with bert-base-cased on FewNERD, CoNLL2003, and OntoNotes v5
|
45 |
results:
|
46 |
- task:
|
47 |
type: token-classification
|
48 |
name: Named Entity Recognition
|
49 |
dataset:
|
50 |
+
name: FewNERD, CoNLL2003, and OntoNotes v5
|
51 |
type: tomaarsen/ner-orgs
|
52 |
split: test
|
53 |
metrics:
|
54 |
- type: f1
|
55 |
+
value: 0.7946954813359528
|
56 |
name: F1
|
57 |
- type: precision
|
58 |
+
value: 0.7958325880879986
|
59 |
name: Precision
|
60 |
- type: recall
|
61 |
+
value: 0.793561619404316
|
62 |
name: Recall
|
63 |
---
|
64 |
|
65 |
+
# SpanMarker with bert-base-cased on FewNERD, CoNLL2003, and OntoNotes v5
|
66 |
|
67 |
+
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD, CoNLL2003, and OntoNotes v5](https://huggingface.co/datasets/tomaarsen/ner-orgs) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-cased](https://huggingface.co/bert-base-cased) as the underlying encoder.
|
68 |
|
69 |
## Model Details
|
70 |
|
|
|
73 |
- **Encoder:** [bert-base-cased](https://huggingface.co/bert-base-cased)
|
74 |
- **Maximum Sequence Length:** 256 tokens
|
75 |
- **Maximum Entity Length:** 8 words
|
76 |
+
- **Training Dataset:** [FewNERD, CoNLL2003, and OntoNotes v5](https://huggingface.co/datasets/tomaarsen/ner-orgs)
|
77 |
- **Language:** en
|
78 |
+
- **License:** cc-by-sa-4.0
|
79 |
|
80 |
### Model Sources
|
81 |
|
|
|
85 |
### Model Labels
|
86 |
| Label | Examples |
|
87 |
|:------|:---------------------------------------------|
|
88 |
+
| ORG | "Texas Chicken", "IAEA", "Church 's Chicken" |
|
89 |
|
90 |
## Evaluation
|
91 |
|
92 |
### Metrics
|
93 |
| Label | Precision | Recall | F1 |
|
94 |
|:--------|:----------|:-------|:-------|
|
95 |
+
| **all** | 0.7958 | 0.7936 | 0.7947 |
|
96 |
+
| ORG | 0.7958 | 0.7936 | 0.7947 |
|
97 |
|
98 |
## Uses
|
99 |
|
|
|
105 |
# Download from the 🤗 Hub
|
106 |
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-orgs")
|
107 |
# Run inference
|
108 |
+
entities = model.predict("Postponed: East Fife v Clydebank, St Johnstone v")
|
109 |
```
|
110 |
|
111 |
### Downstream Use
|
|
|
156 |
### Training Set Metrics
|
157 |
| Training set | Min | Median | Max |
|
158 |
|:----------------------|:----|:--------|:----|
|
159 |
+
| Sentence length | 1 | 23.5706 | 263 |
|
160 |
+
| Entities per sentence | 0 | 0.7865 | 39 |
|
161 |
|
162 |
### Training Hyperparameters
|
163 |
- learning_rate: 5e-05
|
|
|
170 |
- num_epochs: 3
|
171 |
|
172 |
### Training Results
|
173 |
+
| Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
|
174 |
+
|:------:|:-----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
|
175 |
+
| 0.7131 | 3000 | 0.0061 | 0.7978 | 0.7830 | 0.7904 | 0.9764 |
|
176 |
+
| 1.4262 | 6000 | 0.0059 | 0.8170 | 0.7843 | 0.8004 | 0.9774 |
|
177 |
+
| 2.1393 | 9000 | 0.0061 | 0.8221 | 0.7938 | 0.8077 | 0.9772 |
|
178 |
+
| 2.8524 | 12000 | 0.0062 | 0.8211 | 0.8003 | 0.8106 | 0.9780 |
|
|
|
|
|
|
|
|
|
|
|
179 |
|
180 |
### Environmental Impact
|
181 |
Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
|
182 |
+
- **Carbon Emitted**: 0.248 kg of CO2
|
183 |
+
- **Hours Used**: 1.766 hours
|
184 |
|
185 |
### Training Hardware
|
186 |
- **On Cloud**: No
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 433332917
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:55ca4260a3118b42791a244aa1d7981a524aa53b6033730ec8a6f1fba949ee04
|
3 |
size 433332917
|