|
--- |
|
language: |
|
- "en" |
|
license: mit |
|
datasets: |
|
- glue |
|
metrics: |
|
- F1 score |
|
--- |
|
|
|
|
|
# Model Card for WeightWatcher/albert-large-v2-mrpc |
|
This model was finetuned on the GLUE/mrpc task, based on the pretrained |
|
albert-large-v2 model. Hyperparameters were (largely) taken from the following |
|
publication, with some minor exceptions. |
|
|
|
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
|
https://arxiv.org/abs/1909.11942 |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Developed by:** https://huggingface.co/cdhinrichs |
|
- **Model type:** Text Sequence Classification |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
- **Finetuned from model:** https://huggingface.co/albert-large-v2 |
|
|
|
## Uses |
|
Text classification, research and development. |
|
|
|
### Out-of-Scope Use |
|
Not intended for production use. |
|
See https://huggingface.co/albert-large-v2 |
|
|
|
## Bias, Risks, and Limitations |
|
See https://huggingface.co/albert-large-v2 |
|
|
|
### Recommendations |
|
See https://huggingface.co/albert-large-v2 |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AlbertForSequenceClassification |
|
model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-mrpc") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
See https://huggingface.co/datasets/glue#mrpc |
|
|
|
MRPC is a classification task, and a part of the GLUE benchmark. |
|
|
|
|
|
### Training Procedure |
|
Adam optimization was used on the pretrained ALBERT model at |
|
https://huggingface.co/albert-large-v2. |
|
|
|
A checkpoint from MNLI was NOT used, differing from footnote 4 in, |
|
|
|
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
|
https://arxiv.org/abs/1909.11942 |
|
|
|
|
|
#### Training Hyperparameters |
|
Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate, |
|
Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table |
|
A.4 in, |
|
|
|
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
|
https://arxiv.org/abs/1909.11942 |
|
|
|
Max sequence length (MSL) was set to 128, differing from the above. |
|
|
|
|
|
## Evaluation |
|
F1 score is used to evaluate model performance. |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
See https://huggingface.co/datasets/glue#mrpc |
|
|
|
#### Metrics |
|
F1 score |
|
|
|
### Results |
|
Training F1 score: 0.9963621665319321 |
|
|
|
Evaluation F1 score: 0.9176882661996497 |
|
|
|
|
|
## Environmental Impact |
|
The model was finetuned on a single user workstation with a single GPU. CO2 |
|
impact is expected to be minimal. |
|
|
|
|