Edit model card

sheared_llama_1.3b-reazon_v2-ja_en_trans-T2T

This model is a fine-tuned version of princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3844

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 1024
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • training_steps: 200

Training results

Training Loss Epoch Step Validation Loss
2.0409 0.0114 10 1.7202
1.6901 0.0229 20 1.6101
1.5859 0.0343 30 1.5446
1.5533 0.0458 40 1.5029
1.4937 0.0572 50 1.4722
1.4802 0.0687 60 1.4492
1.4484 0.0801 70 1.4302
1.4292 0.0916 80 1.4183
1.4203 0.1030 90 1.4078
1.4184 0.1145 100 1.3985
1.4045 0.1259 110 1.3923
1.4125 0.1374 120 1.3886
1.4098 0.1488 130 1.3877
1.3921 0.1603 140 1.3859
1.3984 0.1717 150 1.3851
1.3858 0.1832 160 1.3845
1.3995 0.1946 170 1.3842
1.3943 0.2061 180 1.3847
1.3988 0.2175 190 1.3844
1.3969 0.2290 200 1.3844

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
1.35B params
Tensor type
BF16
·

Finetuned from