julien-c HF staff commited on
Commit
e036269
1 Parent(s): 594be06

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/TinyBERT-spanish-uncased-finetuned-ner/README.md

Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ thumbnail:
4
+ ---
5
+
6
+ # Spanish TinyBERT + NER
7
+
8
+ This model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corpora) of a [Spanish Tiny Bert](https://huggingface.co/mrm8488/es-tinybert-v1-1) model I created using *distillation* for **NER** downstream task. The **size** of the model is **55MB**
9
+
10
+ ## Details of the downstream task (NER) - Dataset
11
+
12
+ - [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora)
13
+
14
+ I preprocessed the dataset and split it as train / dev (80/20)
15
+
16
+ | Dataset | # Examples |
17
+ | ---------------------- | ----- |
18
+ | Train | 8.7 K |
19
+ | Dev | 2.2 K |
20
+
21
+
22
+ - [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
23
+
24
+ - Labels covered:
25
+
26
+ ```
27
+ B-LOC
28
+ B-MISC
29
+ B-ORG
30
+ B-PER
31
+ I-LOC
32
+ I-MISC
33
+ I-ORG
34
+ I-PER
35
+ O
36
+ ```
37
+
38
+ ## Metrics on evaluation set:
39
+
40
+ | Metric | # score |
41
+ | :------------------------------------------------------------------------------------: | :-------: |
42
+ | F1 | **70.00**
43
+ | Precision | **67.83** |
44
+ | Recall | **71.46** |
45
+
46
+ ## Comparison:
47
+
48
+ | Model | # F1 score |Size(MB)|
49
+ | :--------------------------------------------------------------------------------------------------------------: | :-------: |:------|
50
+ | bert-base-spanish-wwm-cased (BETO) | 88.43 | 421
51
+ | [bert-spanish-cased-finetuned-ner](https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner) | **90.17** | 420 |
52
+ | Best Multilingual BERT | 87.38 | 681 |
53
+ |TinyBERT-spanish-uncased-finetuned-ner (this one) | 70.00 | **55** |
54
+
55
+ ## Model in action
56
+
57
+
58
+ Example of usage:
59
+
60
+ ```python
61
+ import torch
62
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
63
+
64
+ id2label = {
65
+ "0": "B-LOC",
66
+ "1": "B-MISC",
67
+ "2": "B-ORG",
68
+ "3": "B-PER",
69
+ "4": "I-LOC",
70
+ "5": "I-MISC",
71
+ "6": "I-ORG",
72
+ "7": "I-PER",
73
+ "8": "O"
74
+ }
75
+
76
+ tokenizer = AutoTokenizer.from_pretrained('mrm8488/TinyBERT-spanish-uncased-finetuned-ner')
77
+ model = AutoModelForTokenClassification.from_pretrained('mrm8488/TinyBERT-spanish-uncased-finetuned-ner')
78
+ text ="Mis amigos están pensando viajar a Londres este verano."
79
+ input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
80
+
81
+ outputs = model(input_ids)
82
+ last_hidden_states = outputs[0]
83
+
84
+ for m in last_hidden_states:
85
+ for index, n in enumerate(m):
86
+ if(index > 0 and index <= len(text.split(" "))):
87
+ print(text.split(" ")[index-1] + ": " + id2label[str(torch.argmax(n).item())])
88
+
89
+ '''
90
+ Output:
91
+ --------
92
+ Mis: O
93
+ amigos: O
94
+ están: O
95
+ pensando: O
96
+ viajar: O
97
+ a: O
98
+ Londres: B-LOC
99
+ este: O
100
+ verano.: O
101
+ '''
102
+ ```
103
+
104
+ > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
105
+
106
+ > Made with <span style="color: #e25555;">&hearts;</span> in Spain