lgcharpe
/

ELC_BERT_baby_100M

Model card Files Files and versions Community

ELC_BERT_baby_100M / README

lgcharpe's picture

Create README

dbe8f4e over 1 year ago

history blame contribute delete

1.03 kB

	Hyperparameters for GLUE:
	- Learning rate: 5e-5
	- Batch size: 64
	- Max epochs: 10
	- Patience: 10 (for CoLA, MRPC, RTE, BoolQ, MultiRC, and WSC), 100 (for MNLI, QQP, QNLI, and SST-2)
	- Random seed: 12
	- Weight decay: 0.1
	- Warmup ratio: 0.1
	- Learning rate scheduler: cosine
	- Eval strategy: epoch (for CoLA, MRPC, RTE, BoolQ, MultiRC, and WSC), steps (for MNLI, QQP, QNLI, and SST-2)
	- Eval every: 1 (for CoLA, MRPC, RTE, BoolQ, MultiRC, and WSC), 200 (for SST-2 and QNLI), 500 (for MNLI and QQP)

	Hyperparameters for MSGS:
	- Learning rate: 5e-5 (for CR, SC, RP, MV_RTP, and SC_LC), 1.5e-5 (for LC), 1e-5 (for SC_RP), 8e-6 (for MV_LC), 5e-6 (for MV), 5e-7 (CR_LC)
	- Batch size: 32
	- Max epochs: 10 (for CR, SC, RP, MV_RTP, SC_LC, SC_RP, MV, and CR_LC), 3 (for LC), 5 (for MV_LC)
	- Patience: 10 (for CR, SC, RP, MV_RTP, SC_LC, SC_RP, MV, and CR_LC), 3 (for LC), 5 (for MV_LC)
	- Random seed: 12
	- Weight decay: 0.1
	- Warmup ratio: 0.1
	- Learning rate scheduler: cosine
	- Eval strategy: epoch
	- Eval every: 1