Model Card for WeightWatcher/albert-large-v2-rte

This model was finetuned on the GLUE/rte task, based on the pretrained albert-large-v2 model. Hyperparameters were (largely) taken from the following publication, with some minor exceptions.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942

Model Details

Model Description

Developed by: https://huggingface.co/cdhinrichs
Model type: Text Sequence Classification
Language(s) (NLP): English
License: MIT
Finetuned from model: https://huggingface.co/albert-large-v2

Uses

Text classification, research and development.

Out-of-Scope Use

Not intended for production use. See https://huggingface.co/albert-large-v2

Bias, Risks, and Limitations

See https://huggingface.co/albert-large-v2

Recommendations

See https://huggingface.co/albert-large-v2

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AlbertForSequenceClassification
model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-rte")

Training Details

Training Data

See https://huggingface.co/datasets/glue#rte

RTE is a classification task, and a part of the GLUE benchmark.

Training Procedure

Adam optimization was used on the pretrained ALBERT model at https://huggingface.co/albert-large-v2.

A checkpoint from MNLI was NOT used, differing from footnote 4 in,

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942

Training Hyperparameters

Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate, Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table A.4 in,

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942

Max sequence length (MSL) was set to 128, differing from the above.

Evaluation

Classification accuracy is used to evaluate model performance.

Testing Data, Factors & Metrics

Testing Data

See https://huggingface.co/datasets/glue#rte

Metrics

Classification accuracy

Results

Training Classification accuracy: 0.9971887550200803

Evaluation Classification accuracy: 0.8014440433212996

Environmental Impact

The model was finetuned on a single user workstation with a single GPU. CO2 impact is expected to be minimal.

WeightWatcher
/

albert-large-v2-rte

Model Card for WeightWatcher/albert-large-v2-rte

Model Details

Model Description

Uses

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

Results

Environmental Impact

Dataset used to train WeightWatcher/albert-large-v2-rte