Tr-Jp-LLM-1.5B

This model is a fine-tuned version of SakanaAI/TinySwallow-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.4726

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 1024
optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
3.2232	0.0393	500	2.8686
2.3923	0.0787	1000	2.6141
2.2842	0.1180	1500	2.5443
2.2488	0.1574	2000	2.5141
2.2307	0.1967	2500	2.4977
2.2199	0.2361	3000	2.4882
2.2178	0.2754	3500	2.4824
2.2126	0.3147	4000	2.4790
2.2119	0.3541	4500	2.4766
2.2084	0.3934	5000	2.4751
2.2075	0.4328	5500	2.4741
2.207	0.4721	6000	2.4735
2.2062	0.5114	6500	2.4731
2.2065	0.5508	7000	2.4730
2.205	0.5901	7500	2.4728
2.206	0.6295	8000	2.4727
2.208	0.6688	8500	2.4726
2.2067	0.7082	9000	2.4727
2.2057	0.7475	9500	2.4726
2.2048	0.7868	10000	2.4726
2.2076	0.8262	10500	2.4726
2.2069	0.8655	11000	2.4726
2.2048	0.9049	11500	2.4726
2.2064	0.9442	12000	2.4726
2.2074	0.9836	12500	2.4726

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu126
Datasets 3.4.1
Tokenizers 0.21.1

oriental-lab
/

Tr-Jp-LLM-1.5B

Tr-Jp-LLM-1.5B

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for oriental-lab/Tr-Jp-LLM-1.5B

Evaluation results