julien-c HF staff commited on
Commit
812e791
1 Parent(s): 56a7947

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/RuPERTa-base-finetuned-pos/README.md

Files changed (1) hide show
  1. README.md +111 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ thumbnail:
4
+ ---
5
+
6
+ # RuPERTa-base (Spanish RoBERTa) + POS 🎃🏷
7
+
8
+ This model is a fine-tuned on [CONLL CORPORA](https://www.kaggle.com/nltkdata/conll-corpora) version of [RuPERTa-base](https://huggingface.co/mrm8488/RuPERTa-base) for **POS** downstream task.
9
+
10
+ ## Details of the downstream task (POS) - Dataset
11
+
12
+ - [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 📚
13
+
14
+ | Dataset | # Examples |
15
+ | ---------------------- | ----- |
16
+ | Train | 445 K |
17
+ | Dev | 55 K |
18
+
19
+ - [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
20
+
21
+ - Labels covered:
22
+
23
+ ```
24
+ ADJ
25
+ ADP
26
+ ADV
27
+ AUX
28
+ CCONJ
29
+ DET
30
+ INTJ
31
+ NOUN
32
+ NUM
33
+ PART
34
+ PRON
35
+ PROPN
36
+ PUNCT
37
+ SCONJ
38
+ SYM
39
+ VERB
40
+ ```
41
+
42
+ ## Metrics on evaluation set 🧾
43
+
44
+ | Metric | # score |
45
+ | :------------------------------------------------------------------------------------: | :-------: |
46
+ | F1 | **97.39**
47
+ | Precision | **97.47** |
48
+ | Recall | **9732** |
49
+
50
+ ## Model in action 🔨
51
+
52
+
53
+ Example of usage
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained('mrm8488/RuPERTa-base-finetuned-pos')
60
+ model = AutoModelForTokenClassification.from_pretrained('mrm8488/RuPERTa-base-finetuned-pos')
61
+
62
+ id2label = {
63
+ "0": "O",
64
+ "1": "ADJ",
65
+ "2": "ADP",
66
+ "3": "ADV",
67
+ "4": "AUX",
68
+ "5": "CCONJ",
69
+ "6": "DET",
70
+ "7": "INTJ",
71
+ "8": "NOUN",
72
+ "9": "NUM",
73
+ "10": "PART",
74
+ "11": "PRON",
75
+ "12": "PROPN",
76
+ "13": "PUNCT",
77
+ "14": "SCONJ",
78
+ "15": "SYM",
79
+ "16": "VERB"
80
+ }
81
+
82
+ text ="Mis amigos están pensando viajar a Londres este verano."
83
+ input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
84
+
85
+ outputs = model(input_ids)
86
+ last_hidden_states = outputs[0]
87
+
88
+ for m in last_hidden_states:
89
+ for index, n in enumerate(m):
90
+ if(index > 0 and index <= len(text.split(" "))):
91
+ print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
92
+
93
+ '''
94
+ Output:
95
+ --------
96
+ Mis: NUM
97
+ amigos: PRON
98
+ están: AUX
99
+ pensando: ADV
100
+ viajar: VERB
101
+ a: ADP
102
+ Londres: PROPN
103
+ este: DET
104
+ verano..: NOUN
105
+ '''
106
+ ```
107
+ Yeah! Not too bad 🎉
108
+
109
+ > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
110
+
111
+ > Made with <span style="color: #e25555;">&hearts;</span> in Spain