juvi21's picture
Update README.md
b064caf verified
|
raw
history blame
21.4 kB
metadata
license: apache-2.0
datasets:
  - teknium/OpenHermes-2.5
tags:
  - axolotl
  - 01-ai/Yi-1.5-9B-Chat
  - finetune
  - gguf

Hermes-2.5-Yi-1.5-9B-Chat-GGUF

This model is a fine-tuned version of 01-ai/Yi-1.5-9B-Chat on the teknium/OpenHermes-2.5 dataset. I'm very happy with the results. The model now seems a lot smarter and "aware" in certain situations. It got quite an big edge on the AGIEval Benchmark for models in it's class.

I plan to extend its context length to 32k with POSE. This is the GGUF repo. You can find the main repo here Hermes-2.5-Yi-1.5-9B-Chat.

Model Details

  • Base Model: 01-ai/Yi-1.5-9B-Chat
  • chat-template: chatml
  • Dataset: teknium/OpenHermes-2.5
  • Sequence Length: 8192 tokens
  • Training:
  • Epochs: 1
  • Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
  • Duration: 48:32:13
  • Cluster: KIT SCC Cluster

Benchmark n_shots=0

image/png

Benchmark Score
ARC (Challenge) 52.47%
ARC (Easy) 81.65%
BoolQ 87.22%
HellaSwag 60.52%
OpenBookQA 33.60%
PIQA 81.12%
Winogrande 72.22%
AGIEval 38.46%
TruthfulQA 44.22%
MMLU 59.72%
IFEval 47.96%

For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.

GGUF and Quantizations

chatml

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Knock Knock, who is there?<|im_end|>
<|im_start|>assistant
Hi there! <|im_end|>

License

This model is released under the Apache 2.0 license.

Acknowledgements

Special thanks to:

  • Teknium for the great OpenHermes-2.5 dataset
  • 01-ai for their great model
  • KIT SCC for FLOPS

Citation

If you use this model in your research, consider citing. Although definetly cite NousResearch and 01-ai:

@misc{
  author = {juvi21},
  title = Hermes-2.5-Yi-1.5-9B-Chat},
  year = {2024},
}

full-benchmark-results

Tasks Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc 0.5381 ± 0.0049
none 0 acc_norm 0.5715 ± 0.0056
- agieval_aqua_rat 1 none 0 acc 0.3858 ± 0.0306
none 0 acc_norm 0.3425 ± 0.0298
- agieval_gaokao_biology 1 none 0 acc 0.6048 ± 0.0338
none 0 acc_norm 0.6000 ± 0.0339
- agieval_gaokao_chemistry 1 none 0 acc 0.4879 ± 0.0348
none 0 acc_norm 0.4106 ± 0.0343
- agieval_gaokao_chinese 1 none 0 acc 0.5935 ± 0.0314
none 0 acc_norm 0.5813 ± 0.0315
- agieval_gaokao_english 1 none 0 acc 0.8235 ± 0.0218
none 0 acc_norm 0.8431 ± 0.0208
- agieval_gaokao_geography 1 none 0 acc 0.7085 ± 0.0323
none 0 acc_norm 0.6985 ± 0.0326
- agieval_gaokao_history 1 none 0 acc 0.7830 ± 0.0269
none 0 acc_norm 0.7660 ± 0.0277
- agieval_gaokao_mathcloze 1 none 0 acc 0.0508 ± 0.0203
- agieval_gaokao_mathqa 1 none 0 acc 0.3761 ± 0.0259
none 0 acc_norm 0.3590 ± 0.0256
- agieval_gaokao_physics 1 none 0 acc 0.4950 ± 0.0354
none 0 acc_norm 0.4700 ± 0.0354
- agieval_jec_qa_ca 1 none 0 acc 0.6557 ± 0.0150
none 0 acc_norm 0.5926 ± 0.0156
- agieval_jec_qa_kd 1 none 0 acc 0.7310 ± 0.0140
none 0 acc_norm 0.6610 ± 0.0150
- agieval_logiqa_en 1 none 0 acc 0.5177 ± 0.0196
none 0 acc_norm 0.4839 ± 0.0196
- agieval_logiqa_zh 1 none 0 acc 0.4854 ± 0.0196
none 0 acc_norm 0.4501 ± 0.0195
- agieval_lsat_ar 1 none 0 acc 0.2913 ± 0.0300
none 0 acc_norm 0.2696 ± 0.0293
- agieval_lsat_lr 1 none 0 acc 0.7196 ± 0.0199
none 0 acc_norm 0.6824 ± 0.0206
- agieval_lsat_rc 1 none 0 acc 0.7212 ± 0.0274
none 0 acc_norm 0.6989 ± 0.0280
- agieval_math 1 none 0 acc 0.0910 ± 0.0091
- agieval_sat_en 1 none 0 acc 0.8204 ± 0.0268
none 0 acc_norm 0.8301 ± 0.0262
- agieval_sat_en_without_passage 1 none 0 acc 0.5194 ± 0.0349
none 0 acc_norm 0.4806 ± 0.0349
- agieval_sat_math 1 none 0 acc 0.5864 ± 0.0333
none 0 acc_norm 0.5409 ± 0.0337
arc_challenge 1 none 0 acc 0.5648 ± 0.0145
none 0 acc_norm 0.5879 ± 0.0144
arc_easy 1 none 0 acc 0.8241 ± 0.0078
none 0 acc_norm 0.8165 ± 0.0079
boolq 2 none 0 acc 0.8624 ± 0.0060
hellaswag 1 none 0 acc 0.5901 ± 0.0049
none 0 acc_norm 0.7767 ± 0.0042
ifeval 2 none 0 inst_level_loose_acc 0.5156 ± N/A
none 0 inst_level_strict_acc 0.4748 ± N/A
none 0 prompt_level_loose_acc 0.3863 ± 0.0210
none 0 prompt_level_strict_acc 0.3309 ± 0.0202
mmlu N/A none 0 acc 0.6942 ± 0.0037
- abstract_algebra 0 none 0 acc 0.4900 ± 0.0502
- anatomy 0 none 0 acc 0.6815 ± 0.0402
- astronomy 0 none 0 acc 0.7895 ± 0.0332
- business_ethics 0 none 0 acc 0.7600 ± 0.0429
- clinical_knowledge 0 none 0 acc 0.7132 ± 0.0278
- college_biology 0 none 0 acc 0.8056 ± 0.0331
- college_chemistry 0 none 0 acc 0.5300 ± 0.0502
- college_computer_science 0 none 0 acc 0.6500 ± 0.0479
- college_mathematics 0 none 0 acc 0.4100 ± 0.0494
- college_medicine 0 none 0 acc 0.6763 ± 0.0357
- college_physics 0 none 0 acc 0.5000 ± 0.0498
- computer_security 0 none 0 acc 0.8200 ± 0.0386
- conceptual_physics 0 none 0 acc 0.7489 ± 0.0283
- econometrics 0 none 0 acc 0.5877 ± 0.0463
- electrical_engineering 0 none 0 acc 0.6759 ± 0.0390
- elementary_mathematics 0 none 0 acc 0.6481 ± 0.0246
- formal_logic 0 none 0 acc 0.5873 ± 0.0440
- global_facts 0 none 0 acc 0.3900 ± 0.0490
- high_school_biology 0 none 0 acc 0.8613 ± 0.0197
- high_school_chemistry 0 none 0 acc 0.6453 ± 0.0337
- high_school_computer_science 0 none 0 acc 0.8300 ± 0.0378
- high_school_european_history 0 none 0 acc 0.8182 ± 0.0301
- high_school_geography 0 none 0 acc 0.8485 ± 0.0255
- high_school_government_and_politics 0 none 0 acc 0.8964 ± 0.0220
- high_school_macroeconomics 0 none 0 acc 0.7923 ± 0.0206
- high_school_mathematics 0 none 0 acc 0.4407 ± 0.0303
- high_school_microeconomics 0 none 0 acc 0.8655 ± 0.0222
- high_school_physics 0 none 0 acc 0.5298 ± 0.0408
- high_school_psychology 0 none 0 acc 0.8679 ± 0.0145
- high_school_statistics 0 none 0 acc 0.6898 ± 0.0315
- high_school_us_history 0 none 0 acc 0.8873 ± 0.0222
- high_school_world_history 0 none 0 acc 0.8312 ± 0.0244
- human_aging 0 none 0 acc 0.7085 ± 0.0305
- human_sexuality 0 none 0 acc 0.7557 ± 0.0377
- humanities N/A none 0 acc 0.6323 ± 0.0067
- international_law 0 none 0 acc 0.8099 ± 0.0358
- jurisprudence 0 none 0 acc 0.7685 ± 0.0408
- logical_fallacies 0 none 0 acc 0.7975 ± 0.0316
- machine_learning 0 none 0 acc 0.5179 ± 0.0474
- management 0 none 0 acc 0.8835 ± 0.0318
- marketing 0 none 0 acc 0.9017 ± 0.0195
- medical_genetics 0 none 0 acc 0.8000 ± 0.0402
- miscellaneous 0 none 0 acc 0.8225 ± 0.0137
- moral_disputes 0 none 0 acc 0.7283 ± 0.0239
- moral_scenarios 0 none 0 acc 0.4860 ± 0.0167
- nutrition 0 none 0 acc 0.7353 ± 0.0253
- other N/A none 0 acc 0.7287 ± 0.0077
- philosophy 0 none 0 acc 0.7170 ± 0.0256
- prehistory 0 none 0 acc 0.7346 ± 0.0246
- professional_accounting 0 none 0 acc 0.5638 ± 0.0296
- professional_law 0 none 0 acc 0.5163 ± 0.0128
- professional_medicine 0 none 0 acc 0.6875 ± 0.0282
- professional_psychology 0 none 0 acc 0.7092 ± 0.0184
- public_relations 0 none 0 acc 0.6727 ± 0.0449
- security_studies 0 none 0 acc 0.7347 ± 0.0283
- social_sciences N/A none 0 acc 0.7910 ± 0.0072
- sociology 0 none 0 acc 0.8060 ± 0.0280
- stem N/A none 0 acc 0.6581 ± 0.0081
- us_foreign_policy 0 none 0 acc 0.8900 ± 0.0314
- virology 0 none 0 acc 0.5301 ± 0.0389
- world_religions 0 none 0 acc 0.8012 ± 0.0306
openbookqa 1 none 0 acc 0.3280 ± 0.0210
none 0 acc_norm 0.4360 ± 0.0222
piqa 1 none 0 acc 0.7982 ± 0.0094
none 0 acc_norm 0.8074 ± 0.0092
truthfulqa N/A none 0 acc 0.4746 ± 0.0116
none 0 bleu_acc 0.4700 ± 0.0175
none 0 bleu_diff 0.3214 ± 0.6045
none 0 bleu_max 22.5895 ± 0.7122
none 0 rouge1_acc 0.4798 ± 0.0175
none 0 rouge1_diff 0.0846 ± 0.7161
none 0 rouge1_max 48.7180 ± 0.7833
none 0 rouge2_acc 0.4149 ± 0.0172
none 0 rouge2_diff -0.4656 ± 0.8375
none 0 rouge2_max 34.0585 ± 0.8974
none 0 rougeL_acc 0.4651 ± 0.0175
none 0 rougeL_diff -0.2804 ± 0.7217
none 0 rougeL_max 45.2232 ± 0.7971
- truthfulqa_gen 3 none 0 bleu_acc 0.4700 ± 0.0175
none 0 bleu_diff 0.3214 ± 0.6045
none 0 bleu_max 22.5895 ± 0.7122
none 0 rouge1_acc 0.4798 ± 0.0175
none 0 rouge1_diff 0.0846 ± 0.7161
none 0 rouge1_max 48.7180 ± 0.7833
none 0 rouge2_acc 0.4149 ± 0.0172
none 0 rouge2_diff -0.4656 ± 0.8375
none 0 rouge2_max 34.0585 ± 0.8974
none 0 rougeL_acc 0.4651 ± 0.0175
none 0 rougeL_diff -0.2804 ± 0.7217
none 0 rougeL_max 45.2232 ± 0.7971
- truthfulqa_mc1 2 none 0 acc 0.3905 ± 0.0171
- truthfulqa_mc2 2 none 0 acc 0.5587 ± 0.0156
winogrande 1 none 0 acc 0.7388 ± 0.0123
Groups Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc 0.5381 ± 0.0049
none 0 acc_norm 0.5715 ± 0.0056
mmlu N/A none 0 acc 0.6942 ± 0.0037
- humanities N/A none 0 acc 0.6323 ± 0.0067
- other N/A none 0 acc 0.7287 ± 0.0077
- social_sciences N/A none 0 acc 0.7910 ± 0.0072
- stem N/A none 0 acc 0.6581 ± 0.0081
truthfulqa N/A none 0 acc 0.4746 ± 0.0116
none 0 bleu_acc 0.4700 ± 0.0175
none 0 bleu_diff 0.3214 ± 0.6045
none 0 bleu_max 22.5895 ± 0.7122
none 0 rouge1_acc 0.4798 ± 0.0175
none 0 rouge1_diff 0.0846 ± 0.7161
none 0 rouge1_max 48.7180 ± 0.7833
none 0 rouge2_acc 0.4149 ± 0.0172
none 0 rouge2_diff -0.4656 ± 0.8375
none 0 rouge2_max 34.0585 ± 0.8974
none 0 rougeL_acc 0.4651 ± 0.0175
none 0 rougeL_diff -0.2804 ± 0.7217
none 0 rougeL_max 45.2232 ± 0.7971