File size: 2,563 Bytes
155c1d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
language: it
datasets:
- xtreme
---

# Italian-Bert  (Italian Bert) + POS ๐ŸŽƒ๐Ÿท

This model is a fine-tuned on [xtreme udpos Italian](https://huggingface.co/nlp/viewer/?dataset=xtreme&config=udpos.Italian) version of [Bert Base Italian](https://huggingface.co/dbmdz/bert-base-italian-cased) for **POS** downstream task.

## Details of the downstream task (POS) - Dataset

- [Dataset: xtreme udpos Italian](https://huggingface.co/nlp/viewer/?dataset=xtreme&config=udpos.Italian) ๐Ÿ“š

| Dataset                | # Examples |
| ---------------------- | ----- |
| Train                  | 716 K |
| Dev                    | 85 K |

- [Fine-tune on NER script provided by @stefan-it](https://raw.githubusercontent.com/stefan-it/fine-tuned-berts-seq/master/scripts/preprocess.py)

- Labels covered:

```
ADJ
ADP
ADV
AUX
CCONJ
DET
INTJ
NOUN
NUM
PART
PRON
PROPN
PUNCT
SCONJ
SYM
VERB
X
```

## Metrics on evaluation set ๐Ÿงพ

|                                                      Metric                                                       |  # score  |
| :------------------------------------------------------------------------------------: | :-------: |
| F1                                       | **97.25**  
| Precision                                | **97.15** | 
| Recall                                   | **97.36** |    

## Model in action ๐Ÿ”จ


Example of usage

```python
from transformers import pipeline

nlp_pos = pipeline(
    "ner",
    model="sachaarbonel/bert-italian-cased-finetuned-pos",
    tokenizer=(
        'sachaarbonel/bert-spanish-cased-finetuned-pos',  
        {"use_fast": False}
))


text = 'Roma รจ la Capitale d'Italia.'

nlp_pos(text)
      
'''
Output:
--------
[{'entity': 'PROPN', 'index': 1, 'score': 0.9995346665382385, 'word': 'roma'},
 {'entity': 'AUX', 'index': 2, 'score': 0.9966597557067871, 'word': 'e'},
 {'entity': 'DET', 'index': 3, 'score': 0.9994786977767944, 'word': 'la'},
 {'entity': 'NOUN',
  'index': 4,
  'score': 0.9995198249816895,
  'word': 'capitale'},
 {'entity': 'ADP', 'index': 5, 'score': 0.9990894198417664, 'word': 'd'},
 {'entity': 'PART', 'index': 6, 'score': 0.57159024477005, 'word': "'"},
 {'entity': 'PROPN',
  'index': 7,
  'score': 0.9994804263114929,
  'word': 'italia'},
 {'entity': 'PUNCT', 'index': 8, 'score': 0.9772886633872986, 'word': '.'}]
'''
```
Yeah! Not too bad ๐ŸŽ‰

> Created by [Sacha Arbonel/@sachaarbonel](https://twitter.com/sachaarbonel) | [LinkedIn](https://www.linkedin.com/in/sacha-arbonel)

> Made with <span style="color: #e25555;">&hearts;</span> in Paris