Dani commited on
Commit
255b995
1 Parent(s): d2f8e65

Update readme and config

Browse files
Files changed (2) hide show
  1. README.md +31 -0
  2. config.json +1 -1
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: spanish
3
+ license: apache-2.0
4
+ datasets:
5
+ - wikipedia
6
+ widget:
7
+ - text: "El español es un idioma muy [MASK] en el mundo."
8
+ ---
9
+
10
+ # DistilBERT base multilingual model Spanish subset (cased)
11
+
12
+ This model is the Spanish extract of `distilbert-base-multilingual-cased`, a distilled version of the [BERT base multilingual model](bert-base-multilingual-cased). It uses the extraction method proposed by Geotrend, which is described in https://github.com/Geotrend-research/smaller-transformers.
13
+
14
+ In particular, we've ran the following script:
15
+
16
+ ```sh
17
+ python reduce_model.py \
18
+ --source_model distilbert-base-multilingual-cased \
19
+ --vocab_file notebooks/selected_tokens/selected_es_tokens.txt \
20
+ --output_model distilbert-base-es-multilingual-cased \
21
+ --convert_to_tf False
22
+ ```
23
+
24
+ The resulting model has the same architecture as DistilmBERT: 6 layers, 768 dimension and 12 heads, with a total of **65M parameters** (compared to 134M parameters for DistilmBERT).
25
+
26
+ The goal of this model is to reduce even further the size of the `distilbert-base-multilingual` multilingual model by selecting only most frequent tokens for Spanish, reducing the size of the embedding layer. For more details visit the paper from the Geotrend team: Load What You Need: Smaller Versions of Multilingual BERT.
27
+
28
+
29
+
30
+
31
+
config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "activation": "gelu",
3
  "architectures": [
4
- "DistilBertModel"
5
  ],
6
  "attention_dropout": 0.1,
7
  "dim": 768,
 
1
  {
2
  "activation": "gelu",
3
  "architectures": [
4
+ "DistilBertForMaskedLM"
5
  ],
6
  "attention_dropout": 0.1,
7
  "dim": 768,