train_residue_list_lr_5e-4_5_epochs

This model is a fine-tuned version of GreatCaptainNemo/ProLLaMA_Stage_1 on the adpr_train dataset. It achieves the following results on the evaluation set:

Loss: 0.5254
Num Input Tokens Seen: 13416448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
2.5191	0.4561	100	2.8151	1230464
0.7346	0.9122	200	0.6728	2455680
1.9608	1.3649	300	1.7802	3673744
3.2759	1.8210	400	1.0581	4901648
0.5697	2.2737	500	0.5570	6120272
0.5411	2.7298	600	0.5424	7348944
0.5415	3.1824	700	0.5388	8570128
0.5477	3.6385	800	0.5335	9801744
0.5305	4.0912	900	0.5296	11019520
0.5281	4.5473	1000	0.5262	12247168

Framework versions

PEFT 0.14.0
Transformers 4.48.3
Pytorch 2.3.1+cu121
Datasets 3.5.0
Tokenizers 0.21.0

jbenbudd
/

ADPrLlama

You need to agree to share your contact information to access this model

train_residue_list_lr_5e-4_5_epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jbenbudd/ADPrLlama

Evaluation results