all-MiniLM-L12-v2-qa-all
This model is an extractive qa model. It's a fine-tuned version of all-MiniLM-L12-v2 on the following datasets: squad_v2, LLukas22/nq-simplified, newsqa, LLukas22/NLQuAD, deepset/germanquad.
Usage
You can use the model like this:
from transformers import pipeline
#Make predictions
model_name = "LLukas22/all-MiniLM-L12-v2-qa-all"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
"question": "What's my name?",
"context": "My name is Clara and I live in Berkeley."
}
result = nlp(QA_input)
print(result)
Alternatively you can load the model and tokenizer on their own:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
#Make predictions
model_name = "LLukas22/all-MiniLM-L12-v2-qa-all"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2E-05
- per device batch size: 60
- effective batch size: 180
- seed: 42
- optimizer: AdamW with betas (0.9,0.999) and eps 1E-08
- weight decay: 1E-02
- D-Adaptation: False
- Warmup: True
- number of epochs: 15
- mixed_precision_training: bf16
Training results
Epoch | Train Loss | Validation Loss |
---|---|---|
0 | 3.76 | 3.02 |
1 | 2.57 | 2.23 |
2 | 2.2 | 2.08 |
3 | 2.07 | 2.03 |
4 | 1.96 | 1.97 |
5 | 1.87 | 1.93 |
6 | 1.81 | 1.91 |
7 | 1.77 | 1.89 |
8 | 1.73 | 1.89 |
9 | 1.7 | 1.9 |
10 | 1.68 | 1.9 |
11 | 1.67 | 1.9 |
Evaluation results
Epoch | f1 | exact_match |
---|---|---|
0 | 0.29 | 0.228 |
1 | 0.371 | 0.329 |
2 | 0.413 | 0.369 |
3 | 0.437 | 0.376 |
4 | 0.454 | 0.388 |
5 | 0.468 | 0.4 |
6 | 0.479 | 0.408 |
7 | 0.487 | 0.415 |
8 | 0.495 | 0.421 |
9 | 0.501 | 0.416 |
10 | 0.506 | 0.42 |
11 | 0.51 | 0.421 |
Framework versions
- Transformers: 4.25.1
- PyTorch: 2.0.0.dev20230210+cu118
- PyTorch Lightning: 1.8.6
- Datasets: 2.7.1
- Tokenizers: 0.13.1
- Sentence Transformers: 2.2.2
Additional Information
This model was trained as part of my Master's Thesis 'Evaluation of transformer based language models for use in service information systems'. The source code is available on Github.
- Downloads last month
- 28
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.