BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

second epoch of fine-tuning on the same dataset w/ different seed

This model is a fine-tuned version of BEE-spoke-data/tFINE-900m-e16-d32-instruct on the pszemraj/infinity-instruct-7m-T2T_en dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1159
  • Num Input Tokens Seen: 810839096

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 6969
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.234 0.0969 2000 1.2439 78067836
1.2248 0.1938 4000 1.2256 156868756
1.2024 0.2907 6000 1.2009 235148092
1.2074 0.3876 8000 1.1777 313452856
1.1617 0.4845 10000 1.1597 392316428
1.1755 0.5815 12000 1.1437 471101508
1.1473 0.6784 14000 1.1321 549831184
1.1743 0.7753 16000 1.1244 628937800
1.137 0.8722 18000 1.1179 707117360
1.0713 0.9691 20000 1.1160 785755388
Downloads last month
18
Safetensors
Model size
887M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

Dataset used to train BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e