zake7749 commited on
Commit
06666e4
·
verified ·
1 Parent(s): 07c7efa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -8
README.md CHANGED
@@ -4,10 +4,11 @@ language:
4
  - zh
5
  ---
6
 
 
7
  # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
8
 
9
  <p align="left">
10
- 🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp | 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp | &nbsp📑 <a href="#">Paper</a>&nbsp | &nbsp📖 <a href="#">English</a>&nbsp | &nbsp📖 <a href="https://github.com/zake7749/kyara">Chinese</a>
11
  </p>
12
  <div style="text-align: center;">
13
  <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
@@ -61,16 +62,35 @@ All evaluations are based-on zero-shot.
61
  | **[CRUX](https://github.com/yuchenlin/ZeroEval)** | **21.25**| 20.75 |
62
  | **Chinese-Reason-Bench** | **3.97** | 3.44 |
63
 
64
- The aggregation method for TMMLUPlus's groups is macro average, following the practice in the official implementation.
65
 
66
  ### **Alignment Benchmark**
67
 
68
- | Metric | Kyara | Gemma-2-2b-it |
69
  |--------------------------|----------|-------------|
70
- | **[AlpacaEval-LC](https://github.com/tatsu-lab/alpaca_eval)** | **41.15**| 32.37 |
71
- | **[AlpacaEval](https://github.com/tatsu-lab/alpaca_eval)** | **43.95**| 32.94 |
72
  | **[MT-Bench-TW](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2)** | **6.94** | 6.59 |
73
  | **[MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)** | 7.92 | **8.17** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## Feature
76
 
@@ -293,7 +313,7 @@ We followed the original design, using Kyara-SFT to generate a set of contrastiv
293
 
294
  ##### RLAIF
295
 
296
- 資料集:[zake7749/kyara-chinese-preference-dpo-s0-30K](https://huggingface.co/datasets/zake7749/kyara-chinese-preference-dpo-s0-30K)
297
 
298
  We extracted Chinese Prompts from `Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese`, `hfl/stem_zh_instruction`, and `FreedomIntelligence/Evol-Instruct-Chinese-GPT4`, and distributed the same prompt to four different LLMs. The competitors include:
299
 
@@ -348,7 +368,7 @@ Like most models, Kyara still suffers from hallucinations, and possibly due to t
348
  如何解釋「俠者,不受名利所牽,但求俯仰無愧。正邪存乎在我,聲名留問諸君。非我桀傲不馴,乃因唐門薪火,自古如今!」
349
  ```
350
 
351
- The model would respond that this quote is from The "Legend of the Condor Heroes."
352
 
353
  ```
354
  這段話出自《射鵰英雄傳》中的唐門內部經典,由郭靖和黃蓉在唐門大殿上宣讀。這段話主要表達了唐門門人對於俠義精神和內在修為的追求,以及他們對外界的名利和聲名的不屑。以下是詳細解釋:
@@ -368,4 +388,4 @@ The model would respond that this quote is from The "Legend of the Condor Heroes
368
  ���結來說,這段話強調了唐門門人對名利和聲名的不屑,他們追求的是內在的修為和道德,不為外界的評價所動搖。這也體現了他們俠義精神和對正義的堅持。
369
  ```
370
 
371
- It is recommended to exercise caution when using language models.
 
4
  - zh
5
  ---
6
 
7
+
8
  # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
9
 
10
  <p align="left">
11
+ 🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp | 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp | &nbsp📑 <a href="#">Paper</a>&nbsp | &nbsp📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp | &nbsp📖 <a href="https://github.com/zake7749/kyara">Chinese</a>
12
  </p>
13
  <div style="text-align: center;">
14
  <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
 
62
  | **[CRUX](https://github.com/yuchenlin/ZeroEval)** | **21.25**| 20.75 |
63
  | **Chinese-Reason-Bench** | **3.97** | 3.44 |
64
 
65
+ The aggregation method for the groups in TMMLUPlus is macro average, following the practice in the official implementation.
66
 
67
  ### **Alignment Benchmark**
68
 
69
+ | Metric | Kyara-2b-it | Gemma-2-2b-it |
70
  |--------------------------|----------|-------------|
71
+ | **[AlpacaEval LC Win Rate](https://github.com/tatsu-lab/alpaca_eval)** | **41.15** | 32.37 |
72
+ | **[AlpacaEval Win Rate](https://github.com/tatsu-lab/alpaca_eval)** | **43.95** | 32.94 |
73
  | **[MT-Bench-TW](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2)** | **6.94** | 6.59 |
74
  | **[MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)** | 7.92 | **8.17** |
75
+ | **[Chatbot-Arena-Hard](https://github.com/lm-sys/arena-hard-auto)** | 18.9 | **19.4** |
76
+
77
+ #### [AlignBench](https://github.com/THUDM/AlignBench)
78
+
79
+ | Fold | Kyara-2b-it-CHT | Kyara-2b-it-CHS | Gemma-2-2b-it |
80
+ |---------------------------------|-----------------|-----------------|---------------|
81
+ | Fundamental Language Ability | 6.45 | **6.54** | 6.42 |
82
+ | Advanced Chinese Understanding | **5.27** | 5.20 | 5.03 |
83
+ | Open-ended Questions | 7.71 | **7.97** | 7.52 |
84
+ | Writing Ability | 7.98 | **8.06** | 7.76 |
85
+ | Logical Reasoning | **4.68** | 4.52 | 4.20 |
86
+ | Mathematics | **5.10** | 4.72 | 5.05 |
87
+ | Task-oriented Role Play | 7.87 | **8.05** | 7.42 |
88
+ | Professional Knowledge | **6.68** | 6.32 | 5.79 |
89
+ | Reasoning AVG. | **4.89** | 4.62 | 4.63 |
90
+ | Chinage Language AVG. | 6.99 | **7.02** | 6.66 |
91
+ | Overall | **5.94** | 5.82 | 5.64 |
92
+
93
+ where the postfixes CHT and CHS represent Traditional Chinese and Simplified Chinese, respectively. To evaluate the performance on Traditional Chinese in AlignBench, we used [OpenCC](https://github.com/BYVoid/OpenCC) with the `s2twp` configuration to convert all questions, evidence, and references from Simplified Chinese to Traditional Chinese.
94
 
95
  ## Feature
96
 
 
313
 
314
  ##### RLAIF
315
 
316
+ Dataset: [zake7749/kyara-chinese-preference-dpo-s0-30K](https://huggingface.co/datasets/zake7749/kyara-chinese-preference-dpo-s0-30K)
317
 
318
  We extracted Chinese Prompts from `Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese`, `hfl/stem_zh_instruction`, and `FreedomIntelligence/Evol-Instruct-Chinese-GPT4`, and distributed the same prompt to four different LLMs. The competitors include:
319
 
 
368
  如何解釋「俠者,不受名利所牽,但求俯仰無愧。正邪存乎在我,聲名留問諸君。非我桀傲不馴,乃因唐門薪火,自古如今!」
369
  ```
370
 
371
+ However, the model would respond that this quote is from The "Legend of the Condor Heroes."
372
 
373
  ```
374
  這段話出自《射鵰英雄傳》中的唐門內部經典,由郭靖和黃蓉在唐門大殿上宣讀。這段話主要表達了唐門門人對於俠義精神和內在修為的追求,以及他們對外界的名利和聲名的不屑。以下是詳細解釋:
 
388
  ���結來說,這段話強調了唐門門人對名利和聲名的不屑,他們追求的是內在的修為和道德,不為外界的評價所動搖。這也體現了他們俠義精神和對正義的堅持。
389
  ```
390
 
391
+ It is recommended to exercise caution when using language models.