File size: 2,429 Bytes
2ad2623 82209c9 2ad2623 82209c9 2ad2623 82209c9 1493034 82209c9 1493034 82209c9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
language:
- "en"
license: mit
datasets:
- glue
metrics:
- F1 score
---
# Model Card for WeightWatcher/albert-large-v2-qqp
This model was finetuned on the GLUE/qqp task, based on the pretrained
albert-large-v2 model. Hyperparameters were (largely) taken from the following
publication, with some minor exceptions.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942
## Model Details
### Model Description
- **Developed by:** https://huggingface.co/cdhinrichs
- **Model type:** Text Sequence Classification
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** https://huggingface.co/albert-large-v2
## Uses
Text classification, research and development.
### Out-of-Scope Use
Not intended for production use.
See https://huggingface.co/albert-large-v2
## Bias, Risks, and Limitations
See https://huggingface.co/albert-large-v2
### Recommendations
See https://huggingface.co/albert-large-v2
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AlbertForSequenceClassification
model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-qqp")
```
## Training Details
### Training Data
See https://huggingface.co/datasets/glue#qqp
QQP is a classification task, and a part of the GLUE benchmark.
### Training Procedure
Adam optimization was used on the pretrained ALBERT model at
https://huggingface.co/albert-large-v2.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942
#### Training Hyperparameters
Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate,
Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table
A.4 in,
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942
Max sequence length (MSL) was set to 128, differing from the above.
## Evaluation
F1 score is used to evaluate model performance.
### Testing Data, Factors & Metrics
#### Testing Data
See https://huggingface.co/datasets/glue#qqp
#### Metrics
F1 score
### Results
Training F1 score: 0.9555347548257284
Evaluation F1 score: 0.87304693979101
## Environmental Impact
The model was finetuned on a single user workstation with a single GPU. CO2
impact is expected to be minimal.
|