Edit model card

Model Card for WeightWatcher/albert-large-v2-mrpc

This model was finetuned on the GLUE/mrpc task, based on the pretrained albert-large-v2 model. Hyperparameters were (largely) taken from the following publication, with some minor exceptions.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942

Model Details

Model Description

Uses

Text classification, research and development.

Out-of-Scope Use

Not intended for production use. See https://huggingface.co/albert-large-v2

Bias, Risks, and Limitations

See https://huggingface.co/albert-large-v2

Recommendations

See https://huggingface.co/albert-large-v2

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AlbertForSequenceClassification
model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-mrpc")

Training Details

Training Data

See https://huggingface.co/datasets/glue#mrpc

MRPC is a classification task, and a part of the GLUE benchmark.

Training Procedure

Adam optimization was used on the pretrained ALBERT model at https://huggingface.co/albert-large-v2.

A checkpoint from MNLI was NOT used, differing from footnote 4 in,

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942

Training Hyperparameters

Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate, Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table A.4 in,

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942

Max sequence length (MSL) was set to 128, differing from the above.

Evaluation

F1 score is used to evaluate model performance.

Testing Data, Factors & Metrics

Testing Data

See https://huggingface.co/datasets/glue#mrpc

Metrics

F1 score

Results

Training F1 score: 0.9963621665319321

Evaluation F1 score: 0.9176882661996497

Environmental Impact

The model was finetuned on a single user workstation with a single GPU. CO2 impact is expected to be minimal.

Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train WeightWatcher/albert-large-v2-mrpc