tomaarsen HF staff commited on
Commit
a591d3c
1 Parent(s): ee44a17

Upload model

Browse files
Files changed (2) hide show
  1. README.md +39 -43
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  language:
3
  - en
 
4
  library_name: span-marker
5
  tags:
6
  - span-marker
@@ -15,55 +16,55 @@ metrics:
15
  - recall
16
  - f1
17
  widget:
18
- - text: Hallacas are also commonly consumed in eastern Cuba parts of Colombia, Ecuador,
19
- Aruba, and Curaçao.
20
- - text: The co-production of Yvon Michel's GYM and Jean Bédard's Interbox promotions
21
- and televised via HBO, has trumped a proposed HBO -televised rematch between Jean
22
- Pascal and RING and WBC 175-pound champion Chad Dawson that was slated for the
23
- same date at Bell Centre in Montreal.
24
- - text: The synoptic conditions see a low over southern Norway, bringing warm south
25
- and southwesterly flows of air up from the inner continental areas of Russia and
26
- Belarus.
27
- - text: The RCIS recommended amongst other things that the Australian Security Intelligence
28
- Organisation (ASIO) areas of investigation be widened to include terrorism.
29
- - text: The large network had multiple campuses in Minnesota, Wisconsin, and South
30
- Dakota.
31
  pipeline_tag: token-classification
32
  co2_eq_emissions:
33
- emissions: 532.6472478623315
34
  source: codecarbon
35
  training_type: fine-tuning
36
  on_cloud: false
37
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
38
  ram_total_size: 31.777088165283203
39
- hours_used: 3.696
40
  hardware_used: 1 x NVIDIA GeForce RTX 3090
41
  base_model: bert-base-cased
42
  model-index:
43
- - name: SpanMarker with bert-base-cased on FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD
44
  results:
45
  - task:
46
  type: token-classification
47
  name: Named Entity Recognition
48
  dataset:
49
- name: FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD
50
  type: tomaarsen/ner-orgs
51
  split: test
52
  metrics:
53
  - type: f1
54
- value: 0.8311343653918766
55
  name: F1
56
  - type: precision
57
- value: 0.8334090564894745
58
  name: Precision
59
  - type: recall
60
- value: 0.8288720574945131
61
  name: Recall
62
  ---
63
 
64
- # SpanMarker with bert-base-cased on FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD
65
 
66
- This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD](https://huggingface.co/datasets/tomaarsen/ner-orgs) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-cased](https://huggingface.co/bert-base-cased) as the underlying encoder.
67
 
68
  ## Model Details
69
 
@@ -72,9 +73,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained
72
  - **Encoder:** [bert-base-cased](https://huggingface.co/bert-base-cased)
73
  - **Maximum Sequence Length:** 256 tokens
74
  - **Maximum Entity Length:** 8 words
75
- - **Training Dataset:** [FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD](https://huggingface.co/datasets/tomaarsen/ner-orgs)
76
  - **Language:** en
77
- <!-- - **License:** Unknown -->
78
 
79
  ### Model Sources
80
 
@@ -84,15 +85,15 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained
84
  ### Model Labels
85
  | Label | Examples |
86
  |:------|:---------------------------------------------|
87
- | ORG | "IAEA", "Church 's Chicken", "Texas Chicken" |
88
 
89
  ## Evaluation
90
 
91
  ### Metrics
92
  | Label | Precision | Recall | F1 |
93
  |:--------|:----------|:-------|:-------|
94
- | **all** | 0.8334 | 0.8289 | 0.8311 |
95
- | ORG | 0.8334 | 0.8289 | 0.8311 |
96
 
97
  ## Uses
98
 
@@ -104,7 +105,7 @@ from span_marker import SpanMarkerModel
104
  # Download from the 🤗 Hub
105
  model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-orgs")
106
  # Run inference
107
- entities = model.predict("The large network had multiple campuses in Minnesota, Wisconsin, and South Dakota.")
108
  ```
109
 
110
  ### Downstream Use
@@ -155,8 +156,8 @@ trainer.save_model("tomaarsen/span-marker-bert-base-orgs-finetuned")
155
  ### Training Set Metrics
156
  | Training set | Min | Median | Max |
157
  |:----------------------|:----|:--------|:----|
158
- | Sentence length | 1 | 22.1911 | 267 |
159
- | Entities per sentence | 0 | 0.8144 | 39 |
160
 
161
  ### Training Hyperparameters
162
  - learning_rate: 5e-05
@@ -169,22 +170,17 @@ trainer.save_model("tomaarsen/span-marker-bert-base-orgs-finetuned")
169
  - num_epochs: 3
170
 
171
  ### Training Results
172
- | Epoch | Step | Validation Loss |
173
- |:------:|:-----:|:---------------:|
174
- | 0.3273 | 3000 | 0.0052 |
175
- | 0.6546 | 6000 | 0.0047 |
176
- | 0.9819 | 9000 | 0.0045 |
177
- | 1.3092 | 12000 | 0.0047 |
178
- | 1.6365 | 15000 | 0.0045 |
179
- | 1.9638 | 18000 | 0.0046 |
180
- | 2.2911 | 21000 | 0.0054 |
181
- | 2.6184 | 24000 | 0.0053 |
182
- | 2.9457 | 27000 | 0.0052 |
183
 
184
  ### Environmental Impact
185
  Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
186
- - **Carbon Emitted**: 0.533 kg of CO2
187
- - **Hours Used**: 3.696 hours
188
 
189
  ### Training Hardware
190
  - **On Cloud**: No
 
1
  ---
2
  language:
3
  - en
4
+ license: cc-by-sa-4.0
5
  library_name: span-marker
6
  tags:
7
  - span-marker
 
16
  - recall
17
  - f1
18
  widget:
19
+ - text: Today in Zhongnanhai, General Secretary of the Communist Party of China, President
20
+ of the country and honorary President of China's Red Cross, Zemin Jiang met with
21
+ representatives of the 6th National Member Congress of China's Red Cross, and
22
+ expressed warm greetings to the 20 million hardworking members on behalf of the
23
+ Central Committee of the Chinese Communist Party and State Council.
24
+ - text: On April 20, 2017, MGM Television Studios, headed by Mark Burnett formed a
25
+ partnership with McLane and Buss to produce and distribute new content across
26
+ a number of media platforms.
27
+ - text: 'Postponed: East Fife v Clydebank, St Johnstone v'
28
+ - text: Prime contractor was Hughes Aircraft Company Electronics Division which developed
29
+ the Tiamat with the assistance of the NACA.
30
+ - text: After graduating from Auburn University with a degree in Engineering in 1985,
31
+ he went on to play inside linebacker for the Pittsburgh Steelers for four seasons.
32
  pipeline_tag: token-classification
33
  co2_eq_emissions:
34
+ emissions: 248.1008753496152
35
  source: codecarbon
36
  training_type: fine-tuning
37
  on_cloud: false
38
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
39
  ram_total_size: 31.777088165283203
40
+ hours_used: 1.766
41
  hardware_used: 1 x NVIDIA GeForce RTX 3090
42
  base_model: bert-base-cased
43
  model-index:
44
+ - name: SpanMarker with bert-base-cased on FewNERD, CoNLL2003, and OntoNotes v5
45
  results:
46
  - task:
47
  type: token-classification
48
  name: Named Entity Recognition
49
  dataset:
50
+ name: FewNERD, CoNLL2003, and OntoNotes v5
51
  type: tomaarsen/ner-orgs
52
  split: test
53
  metrics:
54
  - type: f1
55
+ value: 0.7946954813359528
56
  name: F1
57
  - type: precision
58
+ value: 0.7958325880879986
59
  name: Precision
60
  - type: recall
61
+ value: 0.793561619404316
62
  name: Recall
63
  ---
64
 
65
+ # SpanMarker with bert-base-cased on FewNERD, CoNLL2003, and OntoNotes v5
66
 
67
+ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD, CoNLL2003, and OntoNotes v5](https://huggingface.co/datasets/tomaarsen/ner-orgs) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-cased](https://huggingface.co/bert-base-cased) as the underlying encoder.
68
 
69
  ## Model Details
70
 
 
73
  - **Encoder:** [bert-base-cased](https://huggingface.co/bert-base-cased)
74
  - **Maximum Sequence Length:** 256 tokens
75
  - **Maximum Entity Length:** 8 words
76
+ - **Training Dataset:** [FewNERD, CoNLL2003, and OntoNotes v5](https://huggingface.co/datasets/tomaarsen/ner-orgs)
77
  - **Language:** en
78
+ - **License:** cc-by-sa-4.0
79
 
80
  ### Model Sources
81
 
 
85
  ### Model Labels
86
  | Label | Examples |
87
  |:------|:---------------------------------------------|
88
+ | ORG | "Texas Chicken", "IAEA", "Church 's Chicken" |
89
 
90
  ## Evaluation
91
 
92
  ### Metrics
93
  | Label | Precision | Recall | F1 |
94
  |:--------|:----------|:-------|:-------|
95
+ | **all** | 0.7958 | 0.7936 | 0.7947 |
96
+ | ORG | 0.7958 | 0.7936 | 0.7947 |
97
 
98
  ## Uses
99
 
 
105
  # Download from the 🤗 Hub
106
  model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-orgs")
107
  # Run inference
108
+ entities = model.predict("Postponed: East Fife v Clydebank, St Johnstone v")
109
  ```
110
 
111
  ### Downstream Use
 
156
  ### Training Set Metrics
157
  | Training set | Min | Median | Max |
158
  |:----------------------|:----|:--------|:----|
159
+ | Sentence length | 1 | 23.5706 | 263 |
160
+ | Entities per sentence | 0 | 0.7865 | 39 |
161
 
162
  ### Training Hyperparameters
163
  - learning_rate: 5e-05
 
170
  - num_epochs: 3
171
 
172
  ### Training Results
173
+ | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
174
+ |:------:|:-----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
175
+ | 0.7131 | 3000 | 0.0061 | 0.7978 | 0.7830 | 0.7904 | 0.9764 |
176
+ | 1.4262 | 6000 | 0.0059 | 0.8170 | 0.7843 | 0.8004 | 0.9774 |
177
+ | 2.1393 | 9000 | 0.0061 | 0.8221 | 0.7938 | 0.8077 | 0.9772 |
178
+ | 2.8524 | 12000 | 0.0062 | 0.8211 | 0.8003 | 0.8106 | 0.9780 |
 
 
 
 
 
179
 
180
  ### Environmental Impact
181
  Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
182
+ - **Carbon Emitted**: 0.248 kg of CO2
183
+ - **Hours Used**: 1.766 hours
184
 
185
  ### Training Hardware
186
  - **On Cloud**: No
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f64ee8bee4e465b21fba71e70d47d4bb19ba4eef09d7565dc544b41248ae8e58
3
  size 433332917
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55ca4260a3118b42791a244aa1d7981a524aa53b6033730ec8a6f1fba949ee04
3
  size 433332917