You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

train_residue_list_lr_5e-4_5_epochs

This model is a fine-tuned version of GreatCaptainNemo/ProLLaMA_Stage_1 on the adpr_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5254
  • Num Input Tokens Seen: 13416448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
2.5191 0.4561 100 2.8151 1230464
0.7346 0.9122 200 0.6728 2455680
1.9608 1.3649 300 1.7802 3673744
3.2759 1.8210 400 1.0581 4901648
0.5697 2.2737 500 0.5570 6120272
0.5411 2.7298 600 0.5424 7348944
0.5415 3.1824 700 0.5388 8570128
0.5477 3.6385 800 0.5335 9801744
0.5305 4.0912 900 0.5296 11019520
0.5281 4.5473 1000 0.5262 12247168

Framework versions

  • PEFT 0.14.0
  • Transformers 4.48.3
  • Pytorch 2.3.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.21.0
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jbenbudd/ADPrLlama

Adapter
(2)
this model