Jean-Baptiste commited on
Commit
ea64d5f
1 Parent(s): 12cca13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -31
README.md CHANGED
@@ -24,16 +24,18 @@ Training data was classified as follow:
24
 
25
  Abbreviation|Description
26
  -|-
27
- O| Outside of a named entity
28
- MISC | Miscellaneous entity
29
- PER | Person’s name
30
- ORG | Organization
31
- LOC | Location
32
 
33
  In order to simplify, the prefix B- or I- from original conll2003 was removed.
34
  I used the train and test dataset from original conll2003 for training and the "validation" dataset for validation. This resulted in a dataset of size:
35
- Train | 17494
36
- Validation | 3250
 
 
37
 
38
  ## How to use camembert-ner with HuggingFace
39
 
@@ -90,31 +92,31 @@ nlp("Apple was founded in 1976 by Steve Jobs, Steve Wozniak and Ronald Wayne to
90
  ## Model performances
91
 
92
  Model performances computed on conll2003 validation dataset (computed on the tokens predictions)
93
- ```
94
- entity | precision | recall | f1
95
- - | - | - | -
96
- PER | 0.9914 | 0.9927 | 0.9920
97
- ORG | 0.9627 | 0.9661 | 0.9644
98
- LOC | 0.9795 | 0.9862 | 0.9828
99
- MISC | 0.9292 | 0.9262 | 0.9277
100
- Overall | 0.9740 | 0.9766 | 0.9753
101
- ```
 
102
 
103
  On private dataset (email, chat, informal discussion), computed on word predictions:
104
- ```
105
- entity | precision | recall | f1
106
- - | - | - | -
107
- PER | 0.8823 | 0.9116 | 0.8967
108
- ORG | 0.7694 | 0.7292 | 0.7487
109
- LOC | 0.8619 | 0.7768 | 0.8171
110
- ```
111
 
112
- Spacy (en_core_web_trf-3.2.0) on the same private dataset was giving:
113
- ```
114
- entity | precision | recall | f1
115
- - | - | - | -
116
- PER | 0.9146 | 0.8287 | 0.8695
117
- ORG | 0.7655 | 0.6437 | 0.6993
118
- LOC | 0.8727 | 0.6180 | 0.7236
119
- ```
 
 
 
 
 
 
120
 
 
24
 
25
  Abbreviation|Description
26
  -|-
27
+ O |Outside of a named entity
28
+ MISC |Miscellaneous entity
29
+ PER |Person’s name
30
+ ORG |Organization
31
+ LOC |Location
32
 
33
  In order to simplify, the prefix B- or I- from original conll2003 was removed.
34
  I used the train and test dataset from original conll2003 for training and the "validation" dataset for validation. This resulted in a dataset of size:
35
+
36
+ Train | Validation
37
+ -|-
38
+ 17494 | 3250
39
 
40
  ## How to use camembert-ner with HuggingFace
41
 
 
92
  ## Model performances
93
 
94
  Model performances computed on conll2003 validation dataset (computed on the tokens predictions)
95
+
96
+ entity|precision|recall|f1
97
+ -|-|-|-
98
+ PER|0.9914|0.9927|0.9920
99
+ PER|0.9914|0.9927|0.9920
100
+ ORG|0.9627|0.9661|0.9644
101
+ LOC|0.9795|0.9862|0.9828
102
+ MISC|0.9292|0.9262|0.9277
103
+ Overall|0.9740|0.9766|0.9753
104
+
105
 
106
  On private dataset (email, chat, informal discussion), computed on word predictions:
 
 
 
 
 
 
 
107
 
108
+ entity|precision|recall|f1
109
+ -|-|-|-
110
+ PER|0.8823|0.9116|0.8967
111
+ ORG|0.7694|0.7292|0.7487
112
+ LOC|0.8619|0.7768|0.8171
113
+
114
+ By comparison on the same private dataset, Spacy (en_core_web_trf-3.2.0) was giving:
115
+
116
+ entity|precision|recall|f1
117
+ -|-|-|-
118
+ PER|0.9146|0.8287|0.8695
119
+ ORG|0.7655|0.6437|0.6993
120
+ LOC|0.8727|0.6180|0.7236
121
+
122