julien-c HF staff commited on
Commit
5c98468
1 Parent(s): c84cd1d

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/RuPERTa-base-finetuned-ner/README.md

Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ thumbnail:
4
+ ---
5
+
6
+ # RuPERTa-base (Spanish RoBERTa) + NER 🎃🏷
7
+
8
+ This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corpora) version of [RuPERTa-base](https://huggingface.co/mrm8488/RuPERTa-base) for **NER** downstream task.
9
+
10
+ ## Details of the downstream task (NER) - Dataset
11
+
12
+ - [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) 📚
13
+
14
+ | Dataset | # Examples |
15
+ | ---------------------- | ----- |
16
+ | Train | 329 K |
17
+ | Dev | 40 K |
18
+
19
+
20
+ - [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
21
+
22
+ - Labels covered:
23
+
24
+ ```
25
+ B-LOC
26
+ B-MISC
27
+ B-ORG
28
+ B-PER
29
+ I-LOC
30
+ I-MISC
31
+ I-ORG
32
+ I-PER
33
+ O
34
+ ```
35
+
36
+ ## Metrics on evaluation set 🧾
37
+
38
+ | Metric | # score |
39
+ | :------------------------------------------------------------------------------------: | :-------: |
40
+ | F1 | **77.55**
41
+ | Precision | **75.53** |
42
+ | Recall | **79.68** |
43
+
44
+ ## Model in action 🔨
45
+
46
+
47
+ Example of usage:
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
52
+
53
+ id2label = {
54
+ "0": "B-LOC",
55
+ "1": "B-MISC",
56
+ "2": "B-ORG",
57
+ "3": "B-PER",
58
+ "4": "I-LOC",
59
+ "5": "I-MISC",
60
+ "6": "I-ORG",
61
+ "7": "I-PER",
62
+ "8": "O"
63
+ }
64
+
65
+ text ="Julien, CEO de HF, nació en Francia."
66
+ input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
67
+
68
+ outputs = model(input_ids)
69
+ last_hidden_states = outputs[0]
70
+
71
+ for m in last_hidden_states:
72
+ for index, n in enumerate(m):
73
+ if(index > 0 and index <= len(text.split(" "))):
74
+ print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
75
+
76
+ '''
77
+ Output:
78
+ --------
79
+ Julien,: I-PER
80
+ CEO: O
81
+ de: O
82
+ HF,: B-ORG
83
+ nació: I-PER
84
+ en: I-PER
85
+ Francia.: I-LOC
86
+ '''
87
+ ```
88
+ Yeah! Not too bad 🎉
89
+
90
+ > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
91
+
92
+ > Made with <span style="color: #e25555;">&hearts;</span> in Spain