Cyrile commited on
Commit
1fa35c3
1 Parent(s): 1779f1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -4
README.md CHANGED
@@ -7,12 +7,12 @@ widget:
7
  - text: "Boulanger, habitant à Boulanger et travaillant dans le magasin Boulanger situé dans la ville de Boulanger."
8
  ---
9
  DistilCamemBERT-NER
10
- ==================
11
 
12
  We present DistilCamemBERT-NER which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine tuned for the NER (Named Entity Recognition) task for the French language. The work is inspired by [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) based on the [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by 2** with the same consumption power thanks to [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base).
13
 
14
  Dataset
15
- ----------
16
 
17
  The dataset used is [wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr) which represents ~170k sentences labelized in 5 categories :
18
  * PER: personality ;
@@ -22,7 +22,7 @@ The dataset used is [wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/w
22
  * O: background (Other).
23
 
24
  Evaluation results
25
- ------------------------
26
 
27
  | class | precision (%) | recall (%) | f1 (%) | support (#sub-word) |
28
  | :----: | :---------: | :-----------: | :-----: | :-----------------: |
@@ -33,8 +33,20 @@ Evaluation results
33
  | MISC | 88.55 | 81.84 | 85.06 | 13'553 |
34
  | O | 99.40 | 99.55 | 99.47 | 411'755 |
35
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  How to use DistilCamemBERT-NER
37
- ------------------------------------------------
38
 
39
  ```python
40
  from transformers import pipeline
 
7
  - text: "Boulanger, habitant à Boulanger et travaillant dans le magasin Boulanger situé dans la ville de Boulanger."
8
  ---
9
  DistilCamemBERT-NER
10
+ ===================
11
 
12
  We present DistilCamemBERT-NER which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine tuned for the NER (Named Entity Recognition) task for the French language. The work is inspired by [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) based on the [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by 2** with the same consumption power thanks to [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base).
13
 
14
  Dataset
15
+ -------
16
 
17
  The dataset used is [wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr) which represents ~170k sentences labelized in 5 categories :
18
  * PER: personality ;
 
22
  * O: background (Other).
23
 
24
  Evaluation results
25
+ ------------------
26
 
27
  | class | precision (%) | recall (%) | f1 (%) | support (#sub-word) |
28
  | :----: | :---------: | :-----------: | :-----: | :-----------------: |
 
33
  | MISC | 88.55 | 81.84 | 85.06 | 13'553 |
34
  | O | 99.40 | 99.55 | 99.47 | 411'755 |
35
 
36
+ Benchmark
37
+ ---------
38
+
39
+ This model performance is compared to 3 reference models (see below) with the metric [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient). The score is given with a factor x100 and the delta gain with DistilCamemBERT-NER used in reference is in parantheses. For the inference time measure, an AMD Ryzen 5 4500U @ 2.3 GHz with 6 cores was used:
40
+
41
+ | **model** | **PER** | **LOC** | **ORG** | **MISC** | **O** | **inference time** |
42
+ | :---------------------------------------------------------------------------------------------------------------: | :----------: | :----------: | :----------: | :----------: | :--------- : | :----------------: |
43
+ | [cmarkea/distilcamembert-base-ner](https://huggingface.co/cmarkea/distilcamembert-base-ner) | 93.91 | 88.26 | 84.03 | 82.74 | 91.45 | 43.44 |
44
+ | [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) | 95.20 (+1%) | 90.85 (+3%) | 89.50 (+6%) | 89.02 (+8%) | 92.86 (+2%) | 83.70 (+93%) |
45
+ | [Davlan/bert-base-multilingual-cased-ner-hrl](https://huggingface.co/Davlan/bert-base-multilingual-cased-ner-hrl) | 79.93 (-15%) | 70.39 (-22%) | 60.26 (-28%) | NA | 69.95 (-24%) | 87.56 (+102%) |
46
+ | [flair/ner-french](https://huggingface.co/flair/ner-french) | 80.18 (-15%) | 72.11 (-18%) | 67.29 (-20%) | 72.39 (-17%) | 74.34 (-19%) | 314.96 (+625%) |
47
+
48
  How to use DistilCamemBERT-NER
49
+ ------------------------------
50
 
51
  ```python
52
  from transformers import pipeline