File size: 2,894 Bytes
d8a6a96
 
a7d6832
 
d8a6a96
5867026
d8a6a96
 
 
 
 
 
3f61b30
5867026
ea19a79
5867026
d8a6a96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bcb19a1
d8a6a96
3f61b30
 
5867026
 
 
3f61b30
d8a6a96
3f61b30
 
5867026
 
 
 
 
 
 
 
 
 
 
 
 
bcb19a1
d8a6a96
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
language: fr
datasets:
- Jean-Baptiste/wikiner_fr
widget:
- text: "Je m'appelle jean-baptiste et je vis à montréal"
---

# camembert-ner: model fine-tuned from camemBERT for NER task.

## Introduction

[camembert-ner] is a NER model that was fine-tuned from camemBERT on wikiner-fr dataset.
Model was trained on wikiner-fr dataset (~170 634  sentences).
Model was validated on emails/chat data and overperformed other models on this type of data specifically. 
In particular the model seems to work better on entity that don't start with an upper case.


## How to use camembert-ner with HuggingFace

##### Load camembert-ner and its sub-word tokenizer :

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/camembert-ner")
model = AutoModelForTokenClassification.from_pretrained("Jean-Baptiste/camembert-ner")


##### Process text sample (from wikipedia)

from transformers import pipeline

nlp = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
nlp("Apple est créée le 1er avril 1976 dans le garage de la maison d'enfance de Steve Jobs à Los Altos en Californie par Steve Jobs, Steve Wozniak et Ronald Wayne14, puis constituée sous forme de société le 3 janvier 1977 à l'origine sous le nom d'Apple Computer, mais pour ses 30 ans et pour refléter la diversification de ses produits, le mot « computer » est retiré le 9 janvier 2015.")


[{'entity_group': 'ORG',
  'score': 0.9472818374633789,
  'word': 'Apple',
  'start': 0,
  'end': 5},
 {'entity_group': 'PER',
  'score': 0.9838564991950989,
  'word': 'Steve Jobs',
  'start': 74,
  'end': 85},
 {'entity_group': 'LOC',
  'score': 0.9831605950991312,
  'word': 'Los Altos',
  'start': 87,
  'end': 97},
 {'entity_group': 'LOC',
  'score': 0.9834540486335754,
  'word': 'Californie',
  'start': 100,
  'end': 111},
 {'entity_group': 'PER',
  'score': 0.9841555754343668,
  'word': 'Steve Jobs',
  'start': 115,
  'end': 126},
 {'entity_group': 'PER',
  'score': 0.9843501806259155,
  'word': 'Steve Wozniak',
  'start': 127,
  'end': 141},
 {'entity_group': 'PER',
  'score': 0.9841533899307251,
  'word': 'Ronald Wayne',
  'start': 144,
  'end': 157},
 {'entity_group': 'ORG',
  'score': 0.9468960364659628,
  'word': 'Apple Computer',
  'start': 243,
  'end': 257}]

```


## Model performances (metric: seqeval)

Global
```
'precision': 0.8859
'recall': 0.8971
'f1': 0.8914
```

By entity
```
'LOC': {'precision': 0.8905576596578294,
		'recall': 0.900554675118859,
		'f1': 0.8955282684352223},
'MISC': {'precision': 0.8175627240143369,
		 'recall': 0.8117437722419929,
		 'f1': 0.8146428571428571},
'ORG': {'precision': 0.8099480326651819,
		'recall': 0.8265151515151515,
		'f1': 0.8181477315335584},
'PER': {'precision': 0.9372509960159362,
		'recall': 0.959812321501428,
		'f1': 0.9483975005039308}

 ```