Aris-375M

This model is a fine-tuned version of joseph-ai/Aris-375M on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
training_steps: 7630

Training Loss	Epoch	Step	Validation Loss
3.8660	0.0655	500	3.9581
3.7762	0.1311	1000	3.8790
3.7488	0.1966	1500	3.8372
3.7214	0.2621	2000	3.8135
3.6299	0.3277	2500	3.7986
3.7139	0.3932	3000	3.7894
3.7061	0.4587	3500	3.7832
3.7101	0.5242	4000	3.7798
3.7447	0.5898	4500	3.7778
3.7484	0.6553	5000	3.7768
3.7037	0.7208	5500	3.7765
3.7168	0.7864	6000	3.7763
3.7784	0.8519	6500	3.7762
3.6824	0.9174	7000	3.7762
3.6204	0.9830	7500	3.7762
3.7134	1.0	7630	3.7763

Safetensors

Model size

0.4B params

Tensor type

BF16

Unable to build the model tree, the base model loops to the model itself. Learn more.