Edit model card

mega-ar-126m-v12-python-apps-4096

This model is a fine-tuned version of pszemraj/mega-ar-126m-v12-KIx3 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3247
  • Accuracy: 0.7266

Model description

Just a test - dataset it was tuned on is rather narrow. Note that this has 4096ctx so may be worth giving it 2k tokens or so and seeing how it completes that

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 7427
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.2235 0.32 50 1.5113 0.6991
1.1655 0.64 100 1.4277 0.7109
1.1171 0.96 150 1.3812 0.7183
1.0725 1.28 200 1.3539 0.7220
1.0304 1.6 250 1.3356 0.7246
0.9842 1.92 300 1.3247 0.7266

Framework versions

  • Transformers 4.33.3
  • Pytorch 2.2.0.dev20231017+cu121
  • Datasets 2.14.5
  • Tokenizers 0.13.3
Downloads last month
1
Safetensors
Model size
126M params
Tensor type
F32
·