metadata

license: cc-by-nc-4.0
pipeline_tag: question-answering
tags:
  - question-answering
  - transformers
  - generated_from_trainer
datasets:
  - squad_v2
  - LLukas22/nq-simplified
  - newsqa
  - LLukas22/NLQuAD
  - deepset/germanquad

all-MiniLM-L12-v2-qa-all

This model is an extractive qa model. It's a fine-tuned version of all-MiniLM-L12-v2 on the following datasets: squad_v2, LLukas22/nq-simplified, newsqa, LLukas22/NLQuAD, deepset/germanquad.

Usage

You can use the model like this:

from transformers import pipeline

#Make predictions
model_name = "LLukas22/all-MiniLM-L12-v2-qa-all"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

QA_input = {
    "question": "What's my name?",
    "context": "My name is Clara and I live in Berkeley."
}

result = nlp(QA_input)
print(result)

Alternatively you can load the model and tokenizer on their own:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer

#Make predictions
model_name = "LLukas22/all-MiniLM-L12-v2-qa-all"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2E-05
per device batch size: 60
effective batch size: 180
seed: 42
optimizer: AdamW with betas (0.9,0.999) and eps 1E-08
weight decay: 1E-02
D-Adaptation: False
Warmup: True
number of epochs: 15
mixed_precision_training: bf16

Training results

Epoch	Train Loss	Validation Loss
0	3.76	3.02
1	2.57	2.23
2	2.2	2.08
3	2.07	2.03
4	1.96	1.97
5	1.87	1.93
6	1.81	1.91
7	1.77	1.89
8	1.73	1.89
9	1.7	1.9
10	1.68	1.9
11	1.67	1.9

Evaluation results

Epoch	f1	exact_match
0	0.29	0.228
1	0.371	0.329
2	0.413	0.369
3	0.437	0.376
4	0.454	0.388
5	0.468	0.4
6	0.479	0.408
7	0.487	0.415
8	0.495	0.421
9	0.501	0.416
10	0.506	0.42
11	0.51	0.421

Framework versions

Transformers: 4.25.1
PyTorch: 2.0.0.dev20230210+cu118
PyTorch Lightning: 1.8.6
Datasets: 2.7.1
Tokenizers: 0.13.1
Sentence Transformers: 2.2.2

Additional Information

This model was trained as part of my Master's Thesis 'Evaluation of transformer based language models for use in service information systems'. The source code is available on Github.