Back to all models
question-answering mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint
								$ curl -X POST \
Share Copied link to clipboard

Monthly model downloads

mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es
last 30 days



Contributed by

mrm8488 Manuel Romero
88 models

How to use this model directly from the 🤗/transformers library:

Copy to clipboard
from transformers import AutoTokenizer, AutoModelForQuestionAnswering tokenizer = AutoTokenizer.from_pretrained("mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es") model = AutoModelForQuestionAnswering.from_pretrained("mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es")

BETO (Spanish BERT) + Spanish SQuAD2.0 + distillation using 'bert-base-multilingual-cased' as teacher

This model is a fine-tuned on SQuAD-es-v2.0 and distilled version of BETO for Q&A.

Distillation makes the model smaller, faster, cheaper and lighter than bert-base-spanish-wwm-cased-finetuned-spa-squad2-es

This model was fine-tuned on the same dataset but using distillation during the process as mentioned above (and one more train epoch).

The teacher model for the distillation was bert-base-multilingual-cased. It is the same teacher used for distilbert-base-multilingual-cased AKA DistilmBERT (on average is twice as fast as mBERT-base).

Details of the downstream task (Q&A) - Dataset


Dataset # Q&A
SQuAD2.0 Train 130 K
SQuAD2.0-es-v2.0 111 K
SQuAD2.0 Dev 12 K
SQuAD-es-v2.0-small Dev 69 K

Model training

The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:

!export SQUAD_DIR=/path/to/squad-v2_spanish \
&& python transformers/examples/distillation/ \
  --model_type bert \
  --model_name_or_path dccuchile/bert-base-spanish-wwm-cased \
  --teacher_type bert \
  --teacher_name_or_path bert-base-multilingual-cased \
  --do_train \
  --do_eval \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v2.json \
  --predict_file $SQUAD_DIR/dev-v2.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 5.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /content/model_output \
  --save_steps 5000 \
  --threads 4 \


Metric # Value
Exact 90.7748
F1 94.9471
  "exact": 90.77483309730933,
  "f1": 94.94714391266254,
  "total": 69202,
  "HasAns_exact": 86.60850599781898,
  "HasAns_f1": 92.90582885592328,
  "HasAns_total": 45850,
  "NoAns_exact": 98.95512161699212,
  "NoAns_f1": 98.95512161699212,
  "NoAns_total": 23352,
  "best_exact": 90.77483309730933,
  "best_exact_thresh": 0.0,
  "best_f1": 94.94714391266305,
  "best_f1_thresh": 0.0


Model f1 score
bert-base-spanish-wwm-cased-finetuned-spa-squad2-es 86.07
distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es 94.94

So, yes, this version is even more accurate.

Model in action

Fast usage with pipelines:

from transformers import *

# Important!: By now the QA pipeline is not compatible with fast tokenizer, but they are working on it. So that pass the object to the tokenizer {"use_fast": False} as in the following example:

nlp = pipeline(
        {"use_fast": False}

        'question': '¿Para qué lenguaje está trabajando?',
        'context': 'Manuel Romero está colaborando activamente con huggingface/transformers ' +
                    'para traer el poder de las últimas técnicas de procesamiento de lenguaje natural al idioma español'
# Output: {'answer': 'español', 'end': 169, 'score': 0.67530957344621, 'start': 163}

Play with this model and pipelines in a Colab:

Open In Colab

  1. Set the context and ask some questions:

Set context and questions

  1. Run predictions:

Run the model

More about Huggingface pipelines? check this Colab out:

Open In Colab

Created by Manuel Romero/@mrm8488

Made with in Spain