WeightWatcher
/

albert-large-v2-cola

Text Classification

Inference Endpoints

Model card Files Files and versions Community

albert-large-v2-cola / README.md

cdhinrichs's picture

Updated model card to reflect new owner

c6f338b 11 months ago

|

raw history blame contribute delete

No virus

2.5 kB

	---
	language:
	- "en"
	license: mit
	datasets:
	- glue
	metrics:
	- Matthews Correlation
	---


	# Model Card for WeightWatcher/albert-large-v2-cola
	This model was finetuned on the GLUE/cola task, based on the pretrained
	albert-large-v2 model. Hyperparameters were (largely) taken from the following
	publication, with some minor exceptions.

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942

	## Model Details

	### Model Description
	- Developed by: https://huggingface.co/cdhinrichs
	- Model type: Text Sequence Classification
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: https://huggingface.co/albert-large-v2

	## Uses
	Text classification, research and development.

	### Out-of-Scope Use
	Not intended for production use.
	See https://huggingface.co/albert-large-v2

	## Bias, Risks, and Limitations
	See https://huggingface.co/albert-large-v2

	### Recommendations
	See https://huggingface.co/albert-large-v2


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AlbertForSequenceClassification
	model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-cola")
	```

	## Training Details

	### Training Data
	See https://huggingface.co/datasets/glue#cola

	CoLA is a classification task, and a part of the GLUE benchmark.


	### Training Procedure
	Adam optimization was used on the pretrained ALBERT model at
	https://huggingface.co/albert-large-v2.

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942


	#### Training Hyperparameters
	Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate,
	Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table
	A.4 in,

	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
	https://arxiv.org/abs/1909.11942

	Max sequence length (MSL) was set to 128, differing from the above.


	## Evaluation
	Matthews Correlation is used to evaluate model performance.


	### Testing Data, Factors & Metrics

	#### Testing Data
	See https://huggingface.co/datasets/glue#cola

	#### Metrics
	Matthews Correlation

	### Results
	Training Matthews Correlation: 0.9786230864021822

	Evaluation Matthews Correlation: 0.5723853959351589


	## Environmental Impact
	The model was finetuned on a single user workstation with a single GPU. CO2
	impact is expected to be minimal.