Tr-Jp-LLM-1.5B

This model is a fine-tuned version of SakanaAI/TinySwallow-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4726

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 1024
  • optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
3.2232 0.0393 500 2.8686
2.3923 0.0787 1000 2.6141
2.2842 0.1180 1500 2.5443
2.2488 0.1574 2000 2.5141
2.2307 0.1967 2500 2.4977
2.2199 0.2361 3000 2.4882
2.2178 0.2754 3500 2.4824
2.2126 0.3147 4000 2.4790
2.2119 0.3541 4500 2.4766
2.2084 0.3934 5000 2.4751
2.2075 0.4328 5500 2.4741
2.207 0.4721 6000 2.4735
2.2062 0.5114 6500 2.4731
2.2065 0.5508 7000 2.4730
2.205 0.5901 7500 2.4728
2.206 0.6295 8000 2.4727
2.208 0.6688 8500 2.4726
2.2067 0.7082 9000 2.4727
2.2057 0.7475 9500 2.4726
2.2048 0.7868 10000 2.4726
2.2076 0.8262 10500 2.4726
2.2069 0.8655 11000 2.4726
2.2048 0.9049 11500 2.4726
2.2064 0.9442 12000 2.4726
2.2074 0.9836 12500 2.4726

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu126
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
39
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oriental-lab/Tr-Jp-LLM-1.5B

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(10)
this model
Quantizations
1 model