flan-t5-large-train_r_aug-tqa

This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 8
total_eval_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3.0

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Gen Len
0.8038	0.1659	1000	0.6971	0.0137	44.2541
0.8712	0.3319	2000	0.6793	0.0137	47.4154
0.7521	0.4978	3000	0.6684	0.0137	41.3979
0.6859	0.6638	4000	0.6545	0.0137	45.1304
0.7234	0.8297	5000	0.6476	0.0137	45.7875
0.7439	0.9957	6000	0.6396	0.0137	40.4657
0.6351	1.1616	7000	0.6441	0.0137	42.3456
0.6851	1.3276	8000	0.6383	0.0137	43.8663
0.6932	1.4935	9000	0.6330	0.0137	45.9931
0.6506	1.6595	10000	0.6318	0.0137	42.6188
0.6577	1.8254	11000	0.6277	0.0137	45.5687
0.6659	1.9914	12000	0.6242	0.0137	45.6732
0.5711	2.1573	13000	0.6308	0.0137	45.9669
0.5918	2.3233	14000	0.6282	0.0137	44.9547
0.6076	2.4892	15000	0.6291	0.0137	43.6648
0.5828	2.6552	16000	0.6272	0.0137	43.8111
0.6006	2.8211	17000	0.6256	0.0137	44.7149
0.554	2.9871	18000	0.6251	0.0137	44.5414

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(209)

this model