Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,12 @@ widget:
|
|
7 |
- text: "Boulanger, habitant à Boulanger et travaillant dans le magasin Boulanger situé dans la ville de Boulanger."
|
8 |
---
|
9 |
DistilCamemBERT-NER
|
10 |
-
|
11 |
|
12 |
We present DistilCamemBERT-NER which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine tuned for the NER (Named Entity Recognition) task for the French language. The work is inspired by [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) based on the [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by 2** with the same consumption power thanks to [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base).
|
13 |
|
14 |
Dataset
|
15 |
-
|
16 |
|
17 |
The dataset used is [wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr) which represents ~170k sentences labelized in 5 categories :
|
18 |
* PER: personality ;
|
@@ -22,7 +22,7 @@ The dataset used is [wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/w
|
|
22 |
* O: background (Other).
|
23 |
|
24 |
Evaluation results
|
25 |
-
|
26 |
|
27 |
| class | precision (%) | recall (%) | f1 (%) | support (#sub-word) |
|
28 |
| :----: | :---------: | :-----------: | :-----: | :-----------------: |
|
@@ -33,8 +33,20 @@ Evaluation results
|
|
33 |
| MISC | 88.55 | 81.84 | 85.06 | 13'553 |
|
34 |
| O | 99.40 | 99.55 | 99.47 | 411'755 |
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
How to use DistilCamemBERT-NER
|
37 |
-
|
38 |
|
39 |
```python
|
40 |
from transformers import pipeline
|
|
|
7 |
- text: "Boulanger, habitant à Boulanger et travaillant dans le magasin Boulanger situé dans la ville de Boulanger."
|
8 |
---
|
9 |
DistilCamemBERT-NER
|
10 |
+
===================
|
11 |
|
12 |
We present DistilCamemBERT-NER which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine tuned for the NER (Named Entity Recognition) task for the French language. The work is inspired by [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) based on the [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by 2** with the same consumption power thanks to [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base).
|
13 |
|
14 |
Dataset
|
15 |
+
-------
|
16 |
|
17 |
The dataset used is [wikiner_fr](https://huggingface.co/datasets/Jean-Baptiste/wikiner_fr) which represents ~170k sentences labelized in 5 categories :
|
18 |
* PER: personality ;
|
|
|
22 |
* O: background (Other).
|
23 |
|
24 |
Evaluation results
|
25 |
+
------------------
|
26 |
|
27 |
| class | precision (%) | recall (%) | f1 (%) | support (#sub-word) |
|
28 |
| :----: | :---------: | :-----------: | :-----: | :-----------------: |
|
|
|
33 |
| MISC | 88.55 | 81.84 | 85.06 | 13'553 |
|
34 |
| O | 99.40 | 99.55 | 99.47 | 411'755 |
|
35 |
|
36 |
+
Benchmark
|
37 |
+
---------
|
38 |
+
|
39 |
+
This model performance is compared to 3 reference models (see below) with the metric [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient). The score is given with a factor x100 and the delta gain with DistilCamemBERT-NER used in reference is in parantheses. For the inference time measure, an AMD Ryzen 5 4500U @ 2.3 GHz with 6 cores was used:
|
40 |
+
|
41 |
+
| **model** | **PER** | **LOC** | **ORG** | **MISC** | **O** | **inference time** |
|
42 |
+
| :---------------------------------------------------------------------------------------------------------------: | :----------: | :----------: | :----------: | :----------: | :--------- : | :----------------: |
|
43 |
+
| [cmarkea/distilcamembert-base-ner](https://huggingface.co/cmarkea/distilcamembert-base-ner) | 93.91 | 88.26 | 84.03 | 82.74 | 91.45 | 43.44 |
|
44 |
+
| [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) | 95.20 (+1%) | 90.85 (+3%) | 89.50 (+6%) | 89.02 (+8%) | 92.86 (+2%) | 83.70 (+93%) |
|
45 |
+
| [Davlan/bert-base-multilingual-cased-ner-hrl](https://huggingface.co/Davlan/bert-base-multilingual-cased-ner-hrl) | 79.93 (-15%) | 70.39 (-22%) | 60.26 (-28%) | NA | 69.95 (-24%) | 87.56 (+102%) |
|
46 |
+
| [flair/ner-french](https://huggingface.co/flair/ner-french) | 80.18 (-15%) | 72.11 (-18%) | 67.29 (-20%) | 72.39 (-17%) | 74.34 (-19%) | 314.96 (+625%) |
|
47 |
+
|
48 |
How to use DistilCamemBERT-NER
|
49 |
+
------------------------------
|
50 |
|
51 |
```python
|
52 |
from transformers import pipeline
|