julien-c HF staff commited on
Commit
b8069a8
1 Parent(s): 78ee8a6

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/bert-spanish-cased-finetuned-pos/README.md

Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ thumbnail: https://i.imgur.com/jgBdimh.png
4
+ ---
5
+
6
+ # Spanish BERT (BETO) + POS
7
+
8
+ This model is a fine-tuned on Spanish [CONLL CORPORA](https://www.kaggle.com/nltkdata/conll-corpora) version of the Spanish BERT cased [(BETO)](https://github.com/dccuchile/beto) for **POS** (Part of Speech tagging) downstream task.
9
+
10
+ ## Details of the downstream task (POS) - Dataset
11
+
12
+ - [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) with data augmentation techniques
13
+
14
+ I preprocessed the dataset and split it as train / dev (80/20)
15
+
16
+ | Dataset | # Examples |
17
+ | ---------------------- | ----- |
18
+ | Train | 340 K |
19
+ | Dev | 50 K |
20
+
21
+
22
+ - [Fine-tune on NER script provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
23
+
24
+ - **60** Labels covered:
25
+
26
+ ```
27
+ AO, AQ, CC, CS, DA, DD, DE, DI, DN, DP, DT, Faa, Fat, Fc, Fd, Fe, Fg, Fh, Fia, Fit, Fp, Fpa, Fpt, Fs, Ft, Fx, Fz, I, NC, NP, P0, PD, PI, PN, PP, PR, PT, PX, RG, RN, SP, VAI, VAM, VAN, VAP, VAS, VMG, VMI, VMM, VMN, VMP, VMS, VSG, VSI, VSM, VSN, VSP, VSS, Y and Z
28
+ ```
29
+
30
+
31
+ ## Metrics on evaluation set:
32
+
33
+ | Metric | # score |
34
+ | :------------------------------------------------------------------------------------: | :-------: |
35
+ | F1 | **90.06**
36
+ | Precision | **89.46** |
37
+ | Recall | **90.67** |
38
+
39
+ ## Model in action
40
+
41
+ Fast usage with **pipelines**:
42
+
43
+ ```python
44
+ from transformers import pipeline
45
+
46
+ nlp_pos = pipeline(
47
+ "ner",
48
+ model="mrm8488/bert-spanish-cased-finetuned-pos",
49
+ tokenizer=(
50
+ 'mrm8488/bert-spanish-cased-finetuned-pos',
51
+ {"use_fast": False}
52
+ ))
53
+
54
+
55
+ text = 'Mis amigos están pensando en viajar a Londres este verano'
56
+
57
+ nlp_pos(text)
58
+
59
+ #Output:
60
+ '''
61
+ [{'entity': 'NC', 'score': 0.7792173624038696, 'word': '[CLS]'},
62
+ {'entity': 'DP', 'score': 0.9996283650398254, 'word': 'Mis'},
63
+ {'entity': 'NC', 'score': 0.9999253749847412, 'word': 'amigos'},
64
+ {'entity': 'VMI', 'score': 0.9998560547828674, 'word': 'están'},
65
+ {'entity': 'VMG', 'score': 0.9992249011993408, 'word': 'pensando'},
66
+ {'entity': 'SP', 'score': 0.9999602437019348, 'word': 'en'},
67
+ {'entity': 'VMN', 'score': 0.9998666048049927, 'word': 'viajar'},
68
+ {'entity': 'SP', 'score': 0.9999545216560364, 'word': 'a'},
69
+ {'entity': 'VMN', 'score': 0.8722310662269592, 'word': 'Londres'},
70
+ {'entity': 'DD', 'score': 0.9995203614234924, 'word': 'este'},
71
+ {'entity': 'NC', 'score': 0.9999248385429382, 'word': 'verano'},
72
+ {'entity': 'NC', 'score': 0.8802427649497986, 'word': '[SEP]'}]
73
+ '''
74
+ ```
75
+ ![model in action](https://media.giphy.com/media/jVC9m1cNrdIWuAAtjy/giphy.gif)
76
+
77
+ 16 POS tags version also available [here](https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-pos-16-tags)
78
+
79
+
80
+ > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
81
+
82
+ > Made with <span style="color: #e25555;">&hearts;</span> in Spain