zake7749
/

gemma-2-2b-it-chinese-kyara-dpo

@@ -4,10 +4,11 @@ language:
 - zh
 ---
 # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
 <p align="left">
-    🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp ｜ 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp ｜ &nbsp📑 <a href="#">Paper</a>&nbsp ｜ &nbsp📖 <a href="#">English</a>&nbsp ｜ &nbsp📖 <a href="https://github.com/zake7749/kyara">Chinese</a>
 </p>
 <div style="text-align: center;">
   <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
@@ -61,16 +62,35 @@ All evaluations are based-on zero-shot.
 | **[CRUX](https://github.com/yuchenlin/ZeroEval)**          | **21.25**| 20.75     |
 | **Chinese-Reason-Bench** | **3.97** | 3.44        |
- The aggregation method for TMMLUPlus's groups is macro average, following the practice in the official implementation.
 ### **Alignment Benchmark**
-| Metric                   | Kyara    | Gemma-2-2b-it |
 |--------------------------|----------|-------------|
-| **[AlpacaEval-LC](https://github.com/tatsu-lab/alpaca_eval)**        | **41.15**| 32.37       |
-| **[AlpacaEval](https://github.com/tatsu-lab/alpaca_eval)**           | **43.95**| 32.94       |
 | **[MT-Bench-TW](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2)**          | **6.94** | 6.59        |
 | **[MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)**             | 7.92     | **8.17**    |
 ## Feature
@@ -293,7 +313,7 @@ We followed the original design, using Kyara-SFT to generate a set of contrastiv
 ##### RLAIF
-資料集：[zake7749/kyara-chinese-preference-dpo-s0-30K](https://huggingface.co/datasets/zake7749/kyara-chinese-preference-dpo-s0-30K)
 We extracted Chinese Prompts from `Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese`, `hfl/stem_zh_instruction`, and `FreedomIntelligence/Evol-Instruct-Chinese-GPT4`, and distributed the same prompt to four different LLMs. The competitors include:
@@ -348,7 +368,7 @@ Like most models, Kyara still suffers from hallucinations, and possibly due to t
 如何解釋「俠者，不受名利所牽，但求俯仰無愧。正邪存乎在我，聲名留問諸君。非我桀傲不馴，乃因唐門薪火，自古如今！」
 ```
-The model would respond that this quote is from The "Legend of the Condor Heroes."
 ```
 這段話出自《射鵰英雄傳》中的唐門內部經典，由郭靖和黃蓉在唐門大殿上宣讀。這段話主要表達了唐門門人對於俠義精神和內在修為的追求，以及他們對外界的名利和聲名的不屑。以下是詳細解釋：
@@ -368,4 +388,4 @@ The model would respond that this quote is from The "Legend of the Condor Heroes
 ���結來說，這段話強調了唐門門人對名利和聲名的不屑，他們追求的是內在的修為和道德，不為外界的評價所動搖。這也體現了他們俠義精神和對正義的堅持。
 ```
-It is recommended to exercise caution when using language models.

 - zh
 ---
 # Kyara: Knowledge Yielding Adaptive Retrieval Augmentation for LLM Fine-tuning
 <p align="left">
+    🤗 <a href="https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo">Hugging Face</a>&nbsp ｜ 🚀<a href="https://github.com/zake7749/kyara">Github</a>&nbsp ｜ &nbsp📑 <a href="#">Paper</a>&nbsp ｜ &nbsp📖 <a href="https://github.com/zake7749/kyara/blob/main/document/README_EN.md">English</a>&nbsp ｜ &nbsp📖 <a href="https://github.com/zake7749/kyara">Chinese</a>
 </p>
 <div style="text-align: center;">
   <img src="https://i.imgur.com/QiWlcYJ.jpeg" alt="kyara"/>
 | **[CRUX](https://github.com/yuchenlin/ZeroEval)**          | **21.25**| 20.75     |
 | **Chinese-Reason-Bench** | **3.97** | 3.44        |
+ The aggregation method for the groups in TMMLUPlus is macro average, following the practice in the official implementation.
 ### **Alignment Benchmark**
+| Metric                   | Kyara-2b-it  | Gemma-2-2b-it |
 |--------------------------|----------|-------------|
+| **[AlpacaEval LC Win Rate](https://github.com/tatsu-lab/alpaca_eval)**        | **41.15** | 32.37       |
+| **[AlpacaEval Win Rate](https://github.com/tatsu-lab/alpaca_eval)**           | **43.95** | 32.94       |
 | **[MT-Bench-TW](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2)**          | **6.94** | 6.59        |
 | **[MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench)**             | 7.92     | **8.17**    |
+| **[Chatbot-Arena-Hard](https://github.com/lm-sys/arena-hard-auto)**             | 18.9     | **19.4**    |
+#### [AlignBench](https://github.com/THUDM/AlignBench)
+| Fold                            | Kyara-2b-it-CHT | Kyara-2b-it-CHS | Gemma-2-2b-it |
+|---------------------------------|-----------------|-----------------|---------------|
+| Fundamental Language Ability    | 6.45 | **6.54** | 6.42 |
+| Advanced Chinese Understanding	 | **5.27** | 5.20 | 5.03 |
+| Open-ended Questions	          | 7.71 | **7.97** | 7.52 |
+| Writing Ability	                | 7.98 | **8.06** | 7.76 |
+| Logical Reasoning	             | **4.68** | 4.52 | 4.20 |
+| Mathematics	                   | **5.10** | 4.72 | 5.05 |
+| Task-oriented Role Play	       | 7.87 | **8.05** | 7.42 |
+| Professional Knowledge	       | **6.68** | 6.32 | 5.79 |
+| Reasoning AVG.                  | **4.89** | 4.62 | 4.63 |
+| Chinage Language AVG.           | 6.99 | **7.02** | 6.66 |
+| Overall                         | **5.94** | 5.82 | 5.64 |
+where the postfixes CHT and CHS represent Traditional Chinese and Simplified Chinese, respectively. To evaluate the performance on Traditional Chinese in AlignBench, we used [OpenCC](https://github.com/BYVoid/OpenCC) with the `s2twp` configuration to convert all questions, evidence, and references from Simplified Chinese to Traditional Chinese.
 ## Feature
 ##### RLAIF
+Dataset: [zake7749/kyara-chinese-preference-dpo-s0-30K](https://huggingface.co/datasets/zake7749/kyara-chinese-preference-dpo-s0-30K)
 We extracted Chinese Prompts from `Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese`, `hfl/stem_zh_instruction`, and `FreedomIntelligence/Evol-Instruct-Chinese-GPT4`, and distributed the same prompt to four different LLMs. The competitors include:
 如何解釋「俠者，不受名利所牽，但求俯仰無愧。正邪存乎在我，聲名留問諸君。非我桀傲不馴，乃因唐門薪火，自古如今！」
 ```
+However, the model would respond that this quote is from The "Legend of the Condor Heroes."
 ```
 這段話出自《射鵰英雄傳》中的唐門內部經典，由郭靖和黃蓉在唐門大殿上宣讀。這段話主要表達了唐門門人對於俠義精神和內在修為的追求，以及他們對外界的名利和聲名的不屑。以下是詳細解釋：
 ���結來說，這段話強調了唐門門人對名利和聲名的不屑，他們追求的是內在的修為和道德，不為外界的評價所動搖。這也體現了他們俠義精神和對正義的堅持。
 ```
+It is recommended to exercise caution when using language models.