TrTr-CMR-SYDNEY-MS-captioning

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0235
Accuracy: 58.76
Bleu-1: 0.8541
Bleu-2: 0.8006
Bleu-3: 0.7487
Bleu-4: 0.6993
Meteor: 0.7933
Rouge-l: 0.7655
Cider: 3.0086

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 64
seed: 50
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1024
num_epochs: 128
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Bleu-1	Bleu-2	Bleu-3	Bleu-4	Meteor	Rouge-l	Cider
No log	1.0	39	9.5700	65.19	0.0216	0.0015	0.0007	0.0005	0.0279	0.0799	0.0178
No log	2.0	78	4.1754	65.49	0.0379	0.0018	0.0008	0.0006	0.0577	0.1657	0.0117
No log	3.0	117	3.5413	65.74	0.2514	0.1586	0.0767	0.0187	0.1411	0.2765	0.1280
No log	4.0	156	2.8280	56.59	0.4240	0.3439	0.1929	0.1080	0.3176	0.4125	0.3287
No log	5.0	195	2.0664	54.97	0.6707	0.5901	0.4928	0.4076	0.5716	0.5987	1.4822
No log	6.0	234	1.6072	55.09	0.7323	0.6586	0.5768	0.4978	0.6541	0.6633	1.9934
No log	7.0	273	1.3725	60.7	0.8113	0.7281	0.6465	0.5685	0.6930	0.6957	2.2724
No log	8.0	312	1.2215	60.57	0.8166	0.7280	0.6365	0.5539	0.7279	0.7186	2.3927
No log	9.0	351	1.1166	59.36	0.8172	0.7370	0.6502	0.5669	0.7479	0.7404	2.4606
No log	10.0	390	1.0858	60.98	0.8254	0.7438	0.6643	0.5904	0.7643	0.7445	2.4932
No log	11.0	429	1.0154	58.16	0.8209	0.7438	0.6675	0.5908	0.7556	0.7352	2.4243
No log	12.0	468	0.9940	59.48	0.8179	0.7341	0.6502	0.5687	0.7543	0.7421	2.5154
No log	13.0	507	0.9646	57.9	0.8204	0.7470	0.6773	0.6090	0.7776	0.7448	2.6404
No log	14.0	546	0.9777	58.38	0.8203	0.7442	0.6672	0.5905	0.7714	0.7432	2.5989
No log	15.0	585	0.9076	57.9	0.8647	0.8039	0.7501	0.6976	0.8136	0.7911	3.1252
No log	16.0	624	0.9375	56.43	0.8298	0.7695	0.7144	0.6630	0.8087	0.7669	2.8870
No log	17.0	663	0.9850	55.74	0.8266	0.7412	0.6682	0.5989	0.7825	0.7382	2.6386
No log	18.0	702	0.9649	55.53	0.8539	0.7830	0.7139	0.6444	0.7944	0.7638	2.7532
No log	19.0	741	0.9414	59.45	0.8439	0.7701	0.6994	0.6318	0.7585	0.7510	2.7099
No log	20.0	780	0.9716	56.31	0.8280	0.7538	0.6811	0.6130	0.7836	0.7536	2.6435
No log	21.0	819	1.0360	57.58	0.8268	0.7444	0.6703	0.6009	0.7439	0.7311	2.5731
No log	22.0	858	0.9405	55.54	0.8197	0.7381	0.6757	0.6234	0.7705	0.7430	2.7865
No log	23.0	897	1.0226	56.77	0.8227	0.7515	0.6830	0.6153	0.7648	0.7266	2.7045
No log	24.0	936	1.0538	55.75	0.8286	0.7471	0.6761	0.6129	0.7580	0.7454	2.7123
No log	25.0	975	1.0235	58.76	0.8541	0.8006	0.7487	0.6993	0.7933	0.7655	3.0086

Framework versions

Transformers 5.8.1
Pytorch 2.12.0+cu130
Datasets 5.0.0
Tokenizers 0.22.2

Downloads last month: 36

Safetensors

Model size

0.1B params

Tensor type

I64

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support