File size: 2,580 Bytes
714429b
4dddf32
 
714429b
4dddf32
 
 
 
714429b
4dddf32
 
14a2cfe
4dddf32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14a2cfe
4dddf32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
language:
  - "en"
license: mit
datasets:
  - glue
metrics:
  - Classification accuracy
---


# Model Card for WeightWatcher/albert-large-v2-stsb
This model was finetuned on the GLUE/stsb task, based on the pretrained 
albert-large-v2 model. Hyperparameters were (largely) taken from the following 
publication, with some minor exceptions.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942

## Model Details

### Model Description
- **Developed by:** https://huggingface.co/cdhinrichs
- **Model type:** Text Sequence Classification
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** https://huggingface.co/albert-large-v2

## Uses
Text classification, research and development.

### Out-of-Scope Use
Not intended for production use.
See https://huggingface.co/albert-large-v2

## Bias, Risks, and Limitations
See https://huggingface.co/albert-large-v2

### Recommendations
See https://huggingface.co/albert-large-v2


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AlbertForSequenceClassification
model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-stsb")
```

## Training Details

### Training Data
See https://huggingface.co/datasets/glue#stsb

STSB is a classification task, and a part of the GLUE benchmark.


### Training Procedure 
Adam optimization was used on the pretrained ALBERT model at 
https://huggingface.co/albert-large-v2.

A checkpoint from MNLI was NOT used, differing from footnote 4 in,

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942


#### Training Hyperparameters
Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate, 
Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table 
A.4 in,

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942

Max sequence length (MSL) was set to 128, differing from the above.


## Evaluation
Classification accuracy is used to evaluate model performance.


### Testing Data, Factors & Metrics

#### Testing Data
See https://huggingface.co/datasets/glue#stsb

#### Metrics
Classification accuracy

### Results
Training Classification accuracy: 0.9971887550200803

Evaluation Classification accuracy: 0.8014440433212996


## Environmental Impact
The model was finetuned on a single user workstation with a single GPU. CO2 
impact is expected to be minimal.