cdhinrichs commited on
Commit
ae1ffbb
1 Parent(s): 65b13ff

Added a model card

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md CHANGED
@@ -1,3 +1,102 @@
1
  ---
 
 
2
  license: mit
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - "en"
4
  license: mit
5
+ datasets:
6
+ - glue
7
+ metrics:
8
+ - Classification accuracy
9
  ---
10
+
11
+
12
+ # Model Card for cdhinrichs/albert-large-v2-rte
13
+ This model was finetuned on the GLUE/rte task, based on the pretrained
14
+ albert-large-v2 model. Hyperparameters were (largely) taken from the following
15
+ publication, with some minor exceptions.
16
+
17
+ ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
18
+ https://arxiv.org/abs/1909.11942
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+ - **Developed by:** https://huggingface.co/cdhinrichs
24
+ - **Model type:** Text Sequence Classification
25
+ - **Language(s) (NLP):** English
26
+ - **License:** MIT
27
+ - **Finetuned from model:** https://huggingface.co/albert-large-v2
28
+
29
+ ## Uses
30
+ Text classification, research and development.
31
+
32
+ ### Out-of-Scope Use
33
+ Not intended for production use.
34
+ See https://huggingface.co/albert-large-v2
35
+
36
+ ## Bias, Risks, and Limitations
37
+ See https://huggingface.co/albert-large-v2
38
+
39
+ ### Recommendations
40
+ See https://huggingface.co/albert-large-v2
41
+
42
+
43
+ ## How to Get Started with the Model
44
+
45
+ Use the code below to get started with the model.
46
+
47
+ ```python
48
+ from transformers import AlbertForSequenceClassification
49
+ model = AlbertForSequenceClassification.from_pretrained("cdhinrichs/albert-large-v2-rte")
50
+ ```
51
+
52
+ ## Training Details
53
+
54
+ ### Training Data
55
+ See https://huggingface.co/datasets/glue#rte
56
+
57
+ RTE is a classification task, and a part of the GLUE benchmark.
58
+
59
+
60
+ ### Training Procedure
61
+ Adam optimization was used on the pretrained ALBERT model at
62
+ https://huggingface.co/albert-large-v2.
63
+
64
+ A checkpoint from MNLI was NOT used, differing from footnote 4 in,
65
+
66
+ ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
67
+ https://arxiv.org/abs/1909.11942
68
+
69
+
70
+ #### Training Hyperparameters
71
+ Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate,
72
+ Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table
73
+ A.4 in,
74
+
75
+ ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
76
+ https://arxiv.org/abs/1909.11942
77
+
78
+ Max sequence length (MSL) was set to 128, differing from the above.
79
+
80
+
81
+ ## Evaluation
82
+ Classification accuracy is used to evaluate model performance.
83
+
84
+
85
+ ### Testing Data, Factors & Metrics
86
+
87
+ #### Testing Data
88
+ See https://huggingface.co/datasets/glue#rte
89
+
90
+ #### Metrics
91
+ Classification accuracy
92
+
93
+ ### Results
94
+ Training Classification accuracy: 0.9971887550200803
95
+
96
+ Evaluation Classification accuracy: 0.8014440433212996
97
+
98
+
99
+ ## Environmental Impact
100
+ The model was finetuned on a single user workstation with a single GPU. CO2
101
+ impact is expected to be minimal.
102
+