metadata

language:
  - zh
  - en
license: apache-2.0
datasets:
  - wenbopan/Fusang-v1
  - wenbopan/OpenOrca-zh-20k

Faro-Yi-9B

Faro-Yi-9B is an improved Yi-9B-200K with extensive instruction tuning on Fusang-V1. Compared to Yi-9B-200K, Faro-Yi-9B has gained greater capability in various downstream tasks and long-context modeling thanks to the large-scale synthetic data in Fusang-V1.

Performance

Faro-Yi-9B enhances its ability compared to Yi-9B-200K in most dimensions, especially in long-range modeling and bilingual (English, Chinese) understanding. Fi is competitive among all open-sourced models at around 9B parameters. Fi-9B is good at both factual tasks and preferred by LLM-judges.

Fact-based Evaluation (Open LLM Leaderboard)

Metric	MMLU	GSM8K	HellaSwag	TruthfulQA	Arc	Winogrande
Yi-9B-200K	65.73	50.49	56.72	33.80	69.25	71.67
Faro-Yi-9B	68.80	63.08	57.28	40.86	72.58	71.11

Long-context Modeling (LongBench)

Name	Average_zh	Average_en	Code Completion
Yi-9B-200K	30.288	36.7071	72.2
Faro-Yi-9B	41.092	40.9536	46.0

Score breakdown

Name	Few-shot Learning_en	Synthetic Tasks_en	Single-Doc QA_en	Multi-Doc QA_en	Summarization_en	Few-shot Learning_zh	Synthetic Tasks_zh	Single-Doc QA_zh	Multi-Doc QA_zh	Summarization_zh
Yi-9B-200K	60.6	22.8	30.9	38.9	25.8	46.5	28.0	49.6	17.7	9.7
Faro-Yi-9B	63.8	40.2	36.2	38.0	26.3	30.0	75.1	55.6	30.7	14.1

Bilingual Ability (CMMLU & MMLU)

Name	MMLU	CMMLU
Yi-9B-200K	65.73	71.97
Faro-Yi-9B	68.80	73.28