distilbert-base-uncased-distilled-squad

This model is a fine-tuned version of distilbert-base-uncased on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1892

Model description

The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.

This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1.

Results are my own reproduction of the development by Hugging Face.

How to Get Started with the Model

Use the code below:

from transformers import pipeline question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')

context = r""" Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script. """

result = question_answerer(question="What is a good example of a question answering dataset?", context=context) print( f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"

Here is how to use this model in PyTorch:

from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering import torch tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad') model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs)

answer_start_index = torch.argmax(outputs.start_logits) answer_end_index = torch.argmax(outputs.end_logits)

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens)

And in TensorFlow:

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad") model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf") outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0]) answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens)

Uses:

This model can be used for question answering.

Intended uses & limitations

CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.

Training and evaluation data

This model reaches a F1 score of 82.75539002485876 and 'exact_match': 73.66130558183538 on the [SQuAD v1.1] dev set (for comparison, Bert bert-base-uncased version reaches a F1 score of 88.5).d

Training procedure

Preprocessing See the distilbert-base-uncased model card for further details.

Pretraining See the distilbert-base-uncased model card for further details.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
1.2559	1.0	5533	1.1892

Framework versions

Transformers 4.37.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1

Geerath
/

distilbert-base-uncased-distilled-squad