Hermes-2.5-Yi-1.5-9B-Chat
This model is a fine-tuned version of 01-ai/Yi-1.5-9B-Chat on the teknium/OpenHermes-2.5 dataset. I'm very happy with the results. The model now seems a lot smarter and "aware" in certain situations (first look, so I might change my opinion with more usage). It got quite an big edge on the AGIEval Benchmark for models in it's class. I plan to extend its context length to 32k with POSE.
Model Details
- Base Model: 01-ai/Yi-1.5-9B-Chat
- chat-template: chatml
- Dataset: teknium/OpenHermes-2.5
- Sequence Length: 8192 tokens
- Training:
- Epochs: 1
- Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
- Duration: 48:32:13
- Cluster: KIT SCC Cluster
Benchmark n_shots=0
Benchmark | Score |
---|---|
ARC (Challenge) | 52.47% |
ARC (Easy) | 81.65% |
BoolQ | 87.22% |
HellaSwag | 60.52% |
OpenBookQA | 33.60% |
PIQA | 81.12% |
Winogrande | 72.22% |
AGIEval | 38.46% |
TruthfulQA | 44.22% |
MMLU | 59.72% |
IFEval | 47.96% |
For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.
GGUF and Quantizations
- llama.cpp b3166
- juvi21/Hermes-2.5-Yi-1.5-9B-Chat-GGUF is availabe in:
- F16 Q8_0 Q6_KQ5_K_M Q4_K_M Q3_K_M Q2_K
Usage
To use this model, you can load it using the Hugging Face Transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")
tokenizer = AutoTokenizer.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")
# Generate text
input_text = "What is the question to 42?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
chatml
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Knock Knock, who is there?<|im_end|>
<|im_start|>assistant
Hi there! <|im_end|>
License
This model is released under the Apache 2.0 license.
Acknowledgements
Special thanks to:
- Teknium for the great OpenHermes-2.5 dataset
- 01-ai for their great model
- KIT SCC for FLOPS
Citation
If you use this model in your research, consider citing. Although definetly cite NousResearch and 01-ai:
@misc{
author = {juvi21},
title = Hermes-2.5-Yi-1.5-9B-Chat},
year = {2024},
}
full-benchmark-results
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
agieval | N/A | none | 0 | acc | ↑ | 0.5381 | ± | 0.0049 |
none | 0 | acc_norm | ↑ | 0.5715 | ± | 0.0056 | ||
- agieval_aqua_rat | 1 | none | 0 | acc | ↑ | 0.3858 | ± | 0.0306 |
none | 0 | acc_norm | ↑ | 0.3425 | ± | 0.0298 | ||
- agieval_gaokao_biology | 1 | none | 0 | acc | ↑ | 0.6048 | ± | 0.0338 |
none | 0 | acc_norm | ↑ | 0.6000 | ± | 0.0339 | ||
- agieval_gaokao_chemistry | 1 | none | 0 | acc | ↑ | 0.4879 | ± | 0.0348 |
none | 0 | acc_norm | ↑ | 0.4106 | ± | 0.0343 | ||
- agieval_gaokao_chinese | 1 | none | 0 | acc | ↑ | 0.5935 | ± | 0.0314 |
none | 0 | acc_norm | ↑ | 0.5813 | ± | 0.0315 | ||
- agieval_gaokao_english | 1 | none | 0 | acc | ↑ | 0.8235 | ± | 0.0218 |
none | 0 | acc_norm | ↑ | 0.8431 | ± | 0.0208 | ||
- agieval_gaokao_geography | 1 | none | 0 | acc | ↑ | 0.7085 | ± | 0.0323 |
none | 0 | acc_norm | ↑ | 0.6985 | ± | 0.0326 | ||
- agieval_gaokao_history | 1 | none | 0 | acc | ↑ | 0.7830 | ± | 0.0269 |
none | 0 | acc_norm | ↑ | 0.7660 | ± | 0.0277 | ||
- agieval_gaokao_mathcloze | 1 | none | 0 | acc | ↑ | 0.0508 | ± | 0.0203 |
- agieval_gaokao_mathqa | 1 | none | 0 | acc | ↑ | 0.3761 | ± | 0.0259 |
none | 0 | acc_norm | ↑ | 0.3590 | ± | 0.0256 | ||
- agieval_gaokao_physics | 1 | none | 0 | acc | ↑ | 0.4950 | ± | 0.0354 |
none | 0 | acc_norm | ↑ | 0.4700 | ± | 0.0354 | ||
- agieval_jec_qa_ca | 1 | none | 0 | acc | ↑ | 0.6557 | ± | 0.0150 |
none | 0 | acc_norm | ↑ | 0.5926 | ± | 0.0156 | ||
- agieval_jec_qa_kd | 1 | none | 0 | acc | ↑ | 0.7310 | ± | 0.0140 |
none | 0 | acc_norm | ↑ | 0.6610 | ± | 0.0150 | ||
- agieval_logiqa_en | 1 | none | 0 | acc | ↑ | 0.5177 | ± | 0.0196 |
none | 0 | acc_norm | ↑ | 0.4839 | ± | 0.0196 | ||
- agieval_logiqa_zh | 1 | none | 0 | acc | ↑ | 0.4854 | ± | 0.0196 |
none | 0 | acc_norm | ↑ | 0.4501 | ± | 0.0195 | ||
- agieval_lsat_ar | 1 | none | 0 | acc | ↑ | 0.2913 | ± | 0.0300 |
none | 0 | acc_norm | ↑ | 0.2696 | ± | 0.0293 | ||
- agieval_lsat_lr | 1 | none | 0 | acc | ↑ | 0.7196 | ± | 0.0199 |
none | 0 | acc_norm | ↑ | 0.6824 | ± | 0.0206 | ||
- agieval_lsat_rc | 1 | none | 0 | acc | ↑ | 0.7212 | ± | 0.0274 |
none | 0 | acc_norm | ↑ | 0.6989 | ± | 0.0280 | ||
- agieval_math | 1 | none | 0 | acc | ↑ | 0.0910 | ± | 0.0091 |
- agieval_sat_en | 1 | none | 0 | acc | ↑ | 0.8204 | ± | 0.0268 |
none | 0 | acc_norm | ↑ | 0.8301 | ± | 0.0262 | ||
- agieval_sat_en_without_passage | 1 | none | 0 | acc | ↑ | 0.5194 | ± | 0.0349 |
none | 0 | acc_norm | ↑ | 0.4806 | ± | 0.0349 | ||
- agieval_sat_math | 1 | none | 0 | acc | ↑ | 0.5864 | ± | 0.0333 |
none | 0 | acc_norm | ↑ | 0.5409 | ± | 0.0337 | ||
arc_challenge | 1 | none | 0 | acc | ↑ | 0.5648 | ± | 0.0145 |
none | 0 | acc_norm | ↑ | 0.5879 | ± | 0.0144 | ||
arc_easy | 1 | none | 0 | acc | ↑ | 0.8241 | ± | 0.0078 |
none | 0 | acc_norm | ↑ | 0.8165 | ± | 0.0079 | ||
boolq | 2 | none | 0 | acc | ↑ | 0.8624 | ± | 0.0060 |
hellaswag | 1 | none | 0 | acc | ↑ | 0.5901 | ± | 0.0049 |
none | 0 | acc_norm | ↑ | 0.7767 | ± | 0.0042 | ||
ifeval | 2 | none | 0 | inst_level_loose_acc | ↑ | 0.5156 | ± | N/A |
none | 0 | inst_level_strict_acc | ↑ | 0.4748 | ± | N/A | ||
none | 0 | prompt_level_loose_acc | ↑ | 0.3863 | ± | 0.0210 | ||
none | 0 | prompt_level_strict_acc | ↑ | 0.3309 | ± | 0.0202 | ||
mmlu | N/A | none | 0 | acc | ↑ | 0.6942 | ± | 0.0037 |
- abstract_algebra | 0 | none | 0 | acc | ↑ | 0.4900 | ± | 0.0502 |
- anatomy | 0 | none | 0 | acc | ↑ | 0.6815 | ± | 0.0402 |
- astronomy | 0 | none | 0 | acc | ↑ | 0.7895 | ± | 0.0332 |
- business_ethics | 0 | none | 0 | acc | ↑ | 0.7600 | ± | 0.0429 |
- clinical_knowledge | 0 | none | 0 | acc | ↑ | 0.7132 | ± | 0.0278 |
- college_biology | 0 | none | 0 | acc | ↑ | 0.8056 | ± | 0.0331 |
- college_chemistry | 0 | none | 0 | acc | ↑ | 0.5300 | ± | 0.0502 |
- college_computer_science | 0 | none | 0 | acc | ↑ | 0.6500 | ± | 0.0479 |
- college_mathematics | 0 | none | 0 | acc | ↑ | 0.4100 | ± | 0.0494 |
- college_medicine | 0 | none | 0 | acc | ↑ | 0.6763 | ± | 0.0357 |
- college_physics | 0 | none | 0 | acc | ↑ | 0.5000 | ± | 0.0498 |
- computer_security | 0 | none | 0 | acc | ↑ | 0.8200 | ± | 0.0386 |
- conceptual_physics | 0 | none | 0 | acc | ↑ | 0.7489 | ± | 0.0283 |
- econometrics | 0 | none | 0 | acc | ↑ | 0.5877 | ± | 0.0463 |
- electrical_engineering | 0 | none | 0 | acc | ↑ | 0.6759 | ± | 0.0390 |
- elementary_mathematics | 0 | none | 0 | acc | ↑ | 0.6481 | ± | 0.0246 |
- formal_logic | 0 | none | 0 | acc | ↑ | 0.5873 | ± | 0.0440 |
- global_facts | 0 | none | 0 | acc | ↑ | 0.3900 | ± | 0.0490 |
- high_school_biology | 0 | none | 0 | acc | ↑ | 0.8613 | ± | 0.0197 |
- high_school_chemistry | 0 | none | 0 | acc | ↑ | 0.6453 | ± | 0.0337 |
- high_school_computer_science | 0 | none | 0 | acc | ↑ | 0.8300 | ± | 0.0378 |
- high_school_european_history | 0 | none | 0 | acc | ↑ | 0.8182 | ± | 0.0301 |
- high_school_geography | 0 | none | 0 | acc | ↑ | 0.8485 | ± | 0.0255 |
- high_school_government_and_politics | 0 | none | 0 | acc | ↑ | 0.8964 | ± | 0.0220 |
- high_school_macroeconomics | 0 | none | 0 | acc | ↑ | 0.7923 | ± | 0.0206 |
- high_school_mathematics | 0 | none | 0 | acc | ↑ | 0.4407 | ± | 0.0303 |
- high_school_microeconomics | 0 | none | 0 | acc | ↑ | 0.8655 | ± | 0.0222 |
- high_school_physics | 0 | none | 0 | acc | ↑ | 0.5298 | ± | 0.0408 |
- high_school_psychology | 0 | none | 0 | acc | ↑ | 0.8679 | ± | 0.0145 |
- high_school_statistics | 0 | none | 0 | acc | ↑ | 0.6898 | ± | 0.0315 |
- high_school_us_history | 0 | none | 0 | acc | ↑ | 0.8873 | ± | 0.0222 |
- high_school_world_history | 0 | none | 0 | acc | ↑ | 0.8312 | ± | 0.0244 |
- human_aging | 0 | none | 0 | acc | ↑ | 0.7085 | ± | 0.0305 |
- human_sexuality | 0 | none | 0 | acc | ↑ | 0.7557 | ± | 0.0377 |
- humanities | N/A | none | 0 | acc | ↑ | 0.6323 | ± | 0.0067 |
- international_law | 0 | none | 0 | acc | ↑ | 0.8099 | ± | 0.0358 |
- jurisprudence | 0 | none | 0 | acc | ↑ | 0.7685 | ± | 0.0408 |
- logical_fallacies | 0 | none | 0 | acc | ↑ | 0.7975 | ± | 0.0316 |
- machine_learning | 0 | none | 0 | acc | ↑ | 0.5179 | ± | 0.0474 |
- management | 0 | none | 0 | acc | ↑ | 0.8835 | ± | 0.0318 |
- marketing | 0 | none | 0 | acc | ↑ | 0.9017 | ± | 0.0195 |
- medical_genetics | 0 | none | 0 | acc | ↑ | 0.8000 | ± | 0.0402 |
- miscellaneous | 0 | none | 0 | acc | ↑ | 0.8225 | ± | 0.0137 |
- moral_disputes | 0 | none | 0 | acc | ↑ | 0.7283 | ± | 0.0239 |
- moral_scenarios | 0 | none | 0 | acc | ↑ | 0.4860 | ± | 0.0167 |
- nutrition | 0 | none | 0 | acc | ↑ | 0.7353 | ± | 0.0253 |
- other | N/A | none | 0 | acc | ↑ | 0.7287 | ± | 0.0077 |
- philosophy | 0 | none | 0 | acc | ↑ | 0.7170 | ± | 0.0256 |
- prehistory | 0 | none | 0 | acc | ↑ | 0.7346 | ± | 0.0246 |
- professional_accounting | 0 | none | 0 | acc | ↑ | 0.5638 | ± | 0.0296 |
- professional_law | 0 | none | 0 | acc | ↑ | 0.5163 | ± | 0.0128 |
- professional_medicine | 0 | none | 0 | acc | ↑ | 0.6875 | ± | 0.0282 |
- professional_psychology | 0 | none | 0 | acc | ↑ | 0.7092 | ± | 0.0184 |
- public_relations | 0 | none | 0 | acc | ↑ | 0.6727 | ± | 0.0449 |
- security_studies | 0 | none | 0 | acc | ↑ | 0.7347 | ± | 0.0283 |
- social_sciences | N/A | none | 0 | acc | ↑ | 0.7910 | ± | 0.0072 |
- sociology | 0 | none | 0 | acc | ↑ | 0.8060 | ± | 0.0280 |
- stem | N/A | none | 0 | acc | ↑ | 0.6581 | ± | 0.0081 |
- us_foreign_policy | 0 | none | 0 | acc | ↑ | 0.8900 | ± | 0.0314 |
- virology | 0 | none | 0 | acc | ↑ | 0.5301 | ± | 0.0389 |
- world_religions | 0 | none | 0 | acc | ↑ | 0.8012 | ± | 0.0306 |
openbookqa | 1 | none | 0 | acc | ↑ | 0.3280 | ± | 0.0210 |
none | 0 | acc_norm | ↑ | 0.4360 | ± | 0.0222 | ||
piqa | 1 | none | 0 | acc | ↑ | 0.7982 | ± | 0.0094 |
none | 0 | acc_norm | ↑ | 0.8074 | ± | 0.0092 | ||
truthfulqa | N/A | none | 0 | acc | ↑ | 0.4746 | ± | 0.0116 |
none | 0 | bleu_acc | ↑ | 0.4700 | ± | 0.0175 | ||
none | 0 | bleu_diff | ↑ | 0.3214 | ± | 0.6045 | ||
none | 0 | bleu_max | ↑ | 22.5895 | ± | 0.7122 | ||
none | 0 | rouge1_acc | ↑ | 0.4798 | ± | 0.0175 | ||
none | 0 | rouge1_diff | ↑ | 0.0846 | ± | 0.7161 | ||
none | 0 | rouge1_max | ↑ | 48.7180 | ± | 0.7833 | ||
none | 0 | rouge2_acc | ↑ | 0.4149 | ± | 0.0172 | ||
none | 0 | rouge2_diff | ↑ | -0.4656 | ± | 0.8375 | ||
none | 0 | rouge2_max | ↑ | 34.0585 | ± | 0.8974 | ||
none | 0 | rougeL_acc | ↑ | 0.4651 | ± | 0.0175 | ||
none | 0 | rougeL_diff | ↑ | -0.2804 | ± | 0.7217 | ||
none | 0 | rougeL_max | ↑ | 45.2232 | ± | 0.7971 | ||
- truthfulqa_gen | 3 | none | 0 | bleu_acc | ↑ | 0.4700 | ± | 0.0175 |
none | 0 | bleu_diff | ↑ | 0.3214 | ± | 0.6045 | ||
none | 0 | bleu_max | ↑ | 22.5895 | ± | 0.7122 | ||
none | 0 | rouge1_acc | ↑ | 0.4798 | ± | 0.0175 | ||
none | 0 | rouge1_diff | ↑ | 0.0846 | ± | 0.7161 | ||
none | 0 | rouge1_max | ↑ | 48.7180 | ± | 0.7833 | ||
none | 0 | rouge2_acc | ↑ | 0.4149 | ± | 0.0172 | ||
none | 0 | rouge2_diff | ↑ | -0.4656 | ± | 0.8375 | ||
none | 0 | rouge2_max | ↑ | 34.0585 | ± | 0.8974 | ||
none | 0 | rougeL_acc | ↑ | 0.4651 | ± | 0.0175 | ||
none | 0 | rougeL_diff | ↑ | -0.2804 | ± | 0.7217 | ||
none | 0 | rougeL_max | ↑ | 45.2232 | ± | 0.7971 | ||
- truthfulqa_mc1 | 2 | none | 0 | acc | ↑ | 0.3905 | ± | 0.0171 |
- truthfulqa_mc2 | 2 | none | 0 | acc | ↑ | 0.5587 | ± | 0.0156 |
winogrande | 1 | none | 0 | acc | ↑ | 0.7388 | ± | 0.0123 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
agieval | N/A | none | 0 | acc | ↑ | 0.5381 | ± | 0.0049 |
none | 0 | acc_norm | ↑ | 0.5715 | ± | 0.0056 | ||
mmlu | N/A | none | 0 | acc | ↑ | 0.6942 | ± | 0.0037 |
- humanities | N/A | none | 0 | acc | ↑ | 0.6323 | ± | 0.0067 |
- other | N/A | none | 0 | acc | ↑ | 0.7287 | ± | 0.0077 |
- social_sciences | N/A | none | 0 | acc | ↑ | 0.7910 | ± | 0.0072 |
- stem | N/A | none | 0 | acc | ↑ | 0.6581 | ± | 0.0081 |
truthfulqa | N/A | none | 0 | acc | ↑ | 0.4746 | ± | 0.0116 |
none | 0 | bleu_acc | ↑ | 0.4700 | ± | 0.0175 | ||
none | 0 | bleu_diff | ↑ | 0.3214 | ± | 0.6045 | ||
none | 0 | bleu_max | ↑ | 22.5895 | ± | 0.7122 | ||
none | 0 | rouge1_acc | ↑ | 0.4798 | ± | 0.0175 | ||
none | 0 | rouge1_diff | ↑ | 0.0846 | ± | 0.7161 | ||
none | 0 | rouge1_max | ↑ | 48.7180 | ± | 0.7833 | ||
none | 0 | rouge2_acc | ↑ | 0.4149 | ± | 0.0172 | ||
none | 0 | rouge2_diff | ↑ | -0.4656 | ± | 0.8375 | ||
none | 0 | rouge2_max | ↑ | 34.0585 | ± | 0.8974 | ||
none | 0 | rougeL_acc | ↑ | 0.4651 | ± | 0.0175 | ||
none | 0 | rougeL_diff | ↑ | -0.2804 | ± | 0.7217 | ||
none | 0 | rougeL_max | ↑ | 45.2232 | ± | 0.7971 |
- Downloads last month
- 7