leaderboard-pr-bot's picture
Adding Evaluation Results
30a67d4 verified
|
raw
history blame
5.91 kB
---
license: other
license_name: yi-license
license_link: LICENSE
widget:
- text: 你好! 你叫什么名字!
output:
text: 你好,我的名字叫聚言,很高兴见到你。
pipeline_tag: text-generation
model-index:
- name: OrionStar-Yi-34B-Chat-Llama
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 64.93
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 84.34
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 73.67
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 53.35
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 78.85
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 53.9
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
---
[OrionStarAI/OrionStar-Yi-34B-Chat-Llama](https://huggingface.co/OrionStarAI/OrionStar-Yi-34B-Chat-Llama/tree/main)
*This model is identical to [OrionStarAI/OrionStar-Yi-34B](https://huggingface.co/OrionStarAI/OrionStar-Yi-34B/tree/main)
with the only difference being that the tensors have been renamed to follow the LLaMA format for automatic evaluation on the HF leaderboard.*
# Model Introduction
- OrionStar-Yi-34B-Chat from OrionStarAI is based on the open-source Yi-34B model, fine-tuned on a high-quality corpus
of over 15 million sentences. OrionStar-Yi-34B-Chat aims to provide an excellent interactive experience for users in
the large model community.
- The Yi series models, open-sourced by the 01-ai team, have shown impressive performance on various benchmarks in
Chinese, English, and general domains. OrionStar-Yi-34B-Chat further explores the potential of Yi-34B. Through
extensive fine-tuning on a large and high-quality corpus, OrionStar-Yi-34B-Chat performs exceptionally well on
evaluation data. We strive to make it an outstanding open-source alternative in the ChatGPT domain!
- Our fine-tuned model is completely open for academic research, but please adhere to the [agreement](#license) and
the [Yi License](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
- Model Evaluation Results
We use [opencompass](https://opencompass.org.cn) to perform 5-shot on the following general domain datasets Testing.
The evaluation results of other models are taken
from [opencompass leaderboard](https://opencompass.org.cn/leaderboard-llm).
| | C-Eval | MMLU | CMMLU |
|---------------------------|-----------|--------|-----------|
| **GPT-4** | 69.9 | **83** | 71 |
| **ChatGPT** | 52.5 | 69.1 | 53.9 |
| **Claude-1** | 52 | 65.7 | - |
| **TigerBot-70B-Chat-V2** | 57.7 | 65.9 | 59.9 |
| **WeMix-LLaMA2-70B** | 55.2 | 71.3 | 56 |
| **LLaMA-2-70B-Chat** | 44.3 | 63.8 | 43.3 |
| **Qwen-14B-Chat** | 71.7 | 66.4 | 70 |
| **Baichuan2-13B-Chat** | 56.7 | 57 | 58.4 |
| **OrionStar-Yi-34B-Chat** | **77.71** | 78.32 | **73.52** |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_OrionStarAI__OrionStar-Yi-34B-Chat-Llama)
| Metric |Value|
|---------------------------------|----:|
|Avg. |68.17|
|AI2 Reasoning Challenge (25-Shot)|64.93|
|HellaSwag (10-Shot) |84.34|
|MMLU (5-Shot) |73.67|
|TruthfulQA (0-shot) |53.35|
|Winogrande (5-shot) |78.85|
|GSM8k (5-shot) |53.90|