leaderboard-pr-bot's picture
Adding Evaluation Results
5830729 verified
|
raw
history blame
5.16 kB
metadata
license: other
library_name: transformers
tags:
  - llama-factory
  - full
  - generated_from_trainer
base_model: hon9kon9ize/CantoneseLLM-v1.0
model-index:
  - name: CantoneseLLMChat-v1.0-7B
    results: []

CantoneseLLMChat-v1.0-7B

This model is a fine-tuned version of hon9kon9ize/CantoneseLLM-v1.0 on the sft_v1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.3
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
1.3332 0.0480 100 1.3140
1.2185 0.0960 200 1.2879
1.1976 0.1439 300 1.2533
1.1627 0.1919 400 1.2169
1.178 0.2399 500 1.1766
1.133 0.2879 600 1.1296
1.0466 0.3359 700 1.0983
1.0657 0.3839 800 1.0770
1.054 0.4318 900 1.0617
1.0744 0.4798 1000 1.0487
0.9977 0.5278 1100 1.0383
0.9778 0.5758 1200 1.0290
1.0187 0.6238 1300 1.0211
1.085 0.6717 1400 1.0131
0.958 0.7197 1500 1.0072
1.0482 0.7677 1600 1.0007
0.9447 0.8157 1700 0.9946
1.0 0.8637 1800 0.9894
0.9685 0.9117 1900 0.9849
0.8576 0.9596 2000 0.9807
0.8853 1.0076 2100 0.9775
0.947 1.0556 2200 0.9739
0.9207 1.1036 2300 0.9713
0.8596 1.1516 2400 0.9691
1.0277 1.1995 2500 0.9655
0.9646 1.2475 2600 0.9631
0.8583 1.2955 2700 0.9613
0.9367 1.3435 2800 0.9589
0.9146 1.3915 2900 0.9570
0.9697 1.4395 3000 0.9556
0.8713 1.4874 3100 0.9542
0.9855 1.5354 3200 0.9524
0.8651 1.5834 3300 0.9511
0.9448 1.6314 3400 0.9495
0.8997 1.6794 3500 0.9485
1.0446 1.7273 3600 0.9475
0.8862 1.7753 3700 0.9465
0.873 1.8233 3800 0.9456
0.9893 1.8713 3900 0.9448
0.8915 1.9193 4000 0.9442
0.8854 1.9673 4100 0.9435
0.7608 2.0152 4200 0.9447
0.796 2.0632 4300 0.9464
0.9225 2.1112 4400 0.9467
0.9901 2.1592 4500 0.9467
0.9263 2.2072 4600 0.9468
0.7735 2.2551 4700 0.9467
0.8454 2.3031 4800 0.9464
0.8562 2.3511 4900 0.9466
0.8923 2.3991 5000 0.9464
0.7529 2.4471 5100 0.9463
0.8421 2.4951 5200 0.9463
0.8578 2.5430 5300 0.9463
0.8143 2.5910 5400 0.9464
0.8117 2.6390 5500 0.9463
0.861 2.6870 5600 0.9464
0.8415 2.7350 5700 0.9463
0.7846 2.7829 5800 0.9463
0.7605 2.8309 5900 0.9464
0.8721 2.8789 6000 0.9464
0.8566 2.9269 6100 0.9464
0.7978 2.9749 6200 0.9464

Framework versions

  • Transformers 4.45.0
  • Pytorch 2.4.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.20.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 22.98
IFEval (0-Shot) 44.55
BBH (3-Shot) 28.54
MATH Lvl 5 (4-Shot) 17.90
GPQA (0-shot) 9.62
MuSR (0-shot) 6.30
MMLU-PRO (5-shot) 30.94