WeightWatcher
/

albert-large-v2-mrpc

Text Classification

Inference Endpoints

Model card Files Files and versions Community

albert-large-v2-mrpc / README.md

cdhinrichs's picture

Updated model card to reflect new owner

0305330 over 1 year ago

|

2.51 kB

	---
	language:
	- "en"
	license: mit
	datasets:
	- glue
	metrics:
	- F1 score
	---


	# Model Card for WeightWatcher/albert-large-v2-mrpc
	This model was finetuned on the GLUE/mrpc task, based on the pretrained
	albert-large-v2 model. Hyperparameters were (largely) taken from the following
	publication, with some minor exceptions.

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942

	## Model Details

	### Model Description
	- Developed by: https://huggingface.co/cdhinrichs
	- Model type: Text Sequence Classification
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: https://huggingface.co/albert-large-v2

	## Uses
	Text classification, research and development.

	### Out-of-Scope Use
	Not intended for production use.
	See https://huggingface.co/albert-large-v2

	## Bias, Risks, and Limitations
	See https://huggingface.co/albert-large-v2

	### Recommendations
	See https://huggingface.co/albert-large-v2


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AlbertForSequenceClassification
	model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-mrpc")
	```

	## Training Details

	### Training Data
	See https://huggingface.co/datasets/glue#mrpc

	MRPC is a classification task, and a part of the GLUE benchmark.


	### Training Procedure
	Adam optimization was used on the pretrained ALBERT model at
	https://huggingface.co/albert-large-v2.

	A checkpoint from MNLI was NOT used, differing from footnote 4 in,

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942


	#### Training Hyperparameters
	Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate,
	Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table
	A.4 in,

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942

	Max sequence length (MSL) was set to 128, differing from the above.


	## Evaluation
	F1 score is used to evaluate model performance.


	### Testing Data, Factors & Metrics

	#### Testing Data
	See https://huggingface.co/datasets/glue#mrpc

	#### Metrics
	F1 score

	### Results
	Training F1 score: 0.9963621665319321

	Evaluation F1 score: 0.9176882661996497


	## Environmental Impact
	The model was finetuned on a single user workstation with a single GPU. CO2
	impact is expected to be minimal.