WeightWatcher
/

albert-large-v2-mnli

Text Classification

Inference Endpoints

Model card Files Files and versions Community

albert-large-v2-mnli / README.md

cdhinrichs's picture

Added model card

013058e over 1 year ago

|

2.5 kB

	---
	language:
	- "en"
	license: mit
	datasets:
	- glue
	metrics:
	- Classification accuracy
	---


	# Model Card for cdhinrichs/albert-large-v2-mnli
	This model was finetuned on the GLUE/mnli task, based on the pretrained
	albert-large-v2 model. Hyperparameters were (largely) taken from the following
	publication, with some minor exceptions.

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942

	## Model Details

	### Model Description
	- Developed by: https://huggingface.co/cdhinrichs
	- Model type: Text Sequence Classification
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: https://huggingface.co/albert-large-v2

	## Uses
	Text classification, research and development.

	### Out-of-Scope Use
	Not intended for production use.
	See https://huggingface.co/albert-large-v2

	## Bias, Risks, and Limitations
	See https://huggingface.co/albert-large-v2

	### Recommendations
	See https://huggingface.co/albert-large-v2


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AlbertForSequenceClassification
	model = AlbertForSequenceClassification.from_pretrained("cdhinrichs/albert-large-v2-mnli")
	```

	## Training Details

	### Training Data
	See https://huggingface.co/datasets/glue#mnli

	MNLI is a classification task, and a part of the GLUE benchmark.


	### Training Procedure
	Adam optimization was used on the pretrained ALBERT model at
	https://huggingface.co/albert-large-v2.

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942


	#### Training Hyperparameters
	Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate,
	Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table
	A.4 in,

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942

	Max sequence length (MSL) was set to 128, differing from the above.


	## Evaluation
	Classification accuracy is used to evaluate model performance.


	### Testing Data, Factors & Metrics

	#### Testing Data
	See https://huggingface.co/datasets/glue#mnli

	#### Metrics
	Classification accuracy

	### Results
	Training classification accuracy: 0.9567916639080015

	Evaluation classification accuracy: 0.86571574121243


	## Environmental Impact
	The model was finetuned on a single user workstation with a single GPU. CO2
	impact is expected to be minimal.