Recognai
/

distilbert-base-es-multilingual-cased

@@ -4,24 +4,15 @@ license: apache-2.0
 datasets:
 - wikipedia
 widget:
-- text: "El español es un idioma muy [MASK] en el mundo."
 ---
 # DistilBERT base multilingual model Spanish subset (cased)
 This model is the Spanish extract of `distilbert-base-multilingual-cased` (https://huggingface.co/distilbert-base-multilingual-cased), a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). This model is cased: it does make a difference between english and English.
-It uses the extraction method proposed by Geotrend, which is described in https://github.com/Geotrend-research/smaller-transformers.
-Specifically, we've ran the following script:
-```sh
-python reduce_model.py \
-	--source_model distilbert-base-multilingual-cased \
-	--vocab_file notebooks/selected_tokens/selected_es_tokens.txt \
-	--output_model distilbert-base-es-multilingual-cased \
-	--convert_to_tf False
-```
-The resulting model has the same architecture as DistilmBERT: 6 layers, 768 dimension and 12 heads, with a total of **65M parameters** (compared to 134M parameters for DistilmBERT).
-The goal of this model is to reduce even further the size of the `distilbert-base-multilingual` multilingual model by selecting only most frequent tokens for Spanish, reducing the size of the embedding layer. For more details visit the paper from the Geotrend team: Load What You Need: Smaller Versions of Multilingual BERT.

 datasets:
 - wikipedia
 widget:
+- text: "Mi nombre es Juan y vivo en [MASK]."
 ---
 # DistilBERT base multilingual model Spanish subset (cased)
 This model is the Spanish extract of `distilbert-base-multilingual-cased` (https://huggingface.co/distilbert-base-multilingual-cased), a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). This model is cased: it does make a difference between english and English.
+It uses the extraction method proposed by Geotrend described in https://github.com/Geotrend-research/smaller-transformers.
+The resulting model has the same architecture as DistilmBERT: 6 layers, 768 dimension and 12 heads, with a total of **63M parameters** (compared to 134M parameters for DistilmBERT).
+The goal of this model is to reduce even further the size of the `distilbert-base-multilingual` multilingual model by selecting only most frequent tokens for Spanish, reducing the size of the embedding layer. For more details visit the paper from the Geotrend team: Load What You Need: Smaller Versions of Multilingual BERT.

config.json CHANGED Viewed

@@ -18,5 +18,5 @@
   "seq_classif_dropout": 0.2,
   "sinusoidal_pos_embds": false,
   "tie_weights_": true,
-  "vocab_size": 26346
 }

   "seq_classif_dropout": 0.2,
   "sinusoidal_pos_embds": false,
   "tie_weights_": true,
+  "vocab_size": 26360
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0a7e9034002f6027c9c3e2644bf743b008fc7081072839124abd6673e6740c5c
-size 255139145

 version https://git-lfs.github.com/spec/v1
+oid sha256:02e8562d1e4f7f2fe58e9970fa28b3544b066591bc475777c823ab10adcd9af2
+size 255182217

vocab.txt CHANGED Viewed

@@ -1,4 +1,17 @@
 [UNK]
 !
 "
 #

+[PAD]
+[unused1]
+[unused2]
+[unused3]
+[unused4]
+[unused5]
+[unused6]
+[unused7]
+[unused8]
+[unused9]
 [UNK]
+[CLS]
+[SEP]
+[MASK]
 !
 "
 #