hon9kon9ize
/

CantoneseLLM-6B-preview202402

Text Generation

Transformers

Safetensors

Yue Chinese

llama

Eval Results

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

indiejoseph

leaderboard-pr-bot commited on Mar 4

Commit

e13c3b7

•

1 Parent(s): 2ba1246

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (aeb692c2dac465ba1f3798cf9d45ce4a735ae71b)

Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +119 -3

README.md CHANGED Viewed

@@ -1,10 +1,113 @@
 ---
 license: other
 license_name: yi-license
 license_link: https://huggingface.co/01-ai/Yi-6B/blob/main/LICENSE
-language:
-- yue
 pipeline_tag: text-generation
 ---
 # CantoneseLLM
@@ -50,4 +153,17 @@ output = tokenizer.decode(output[0], skip_special_tokens=True)
 The model is intended to use for Cantonese language understanding and generation tasks, it may not be suitable for other Chinese languages. The model is trained on a diverse range of Cantonese text, including news, Wikipedia, and textbooks, it may not be suitable for informal or dialectal Cantonese, it may contain bias and misinformation, please use it with caution.
-We found the model is not well trained on the updated Hong Kong knowledge, it may due to the corpus is not large enough to brainwash the original model. We will continue to improve the model and corpus in the future.

 ---
+language:
+- yue
 license: other
 license_name: yi-license
 license_link: https://huggingface.co/01-ai/Yi-6B/blob/main/LICENSE
 pipeline_tag: text-generation
+model-index:
+- name: CantoneseLLM-6B-preview202402
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 55.63
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hon9kon9ize/CantoneseLLM-6B-preview202402
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 75.8
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hon9kon9ize/CantoneseLLM-6B-preview202402
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 63.07
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hon9kon9ize/CantoneseLLM-6B-preview202402
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 42.26
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hon9kon9ize/CantoneseLLM-6B-preview202402
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 74.11
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hon9kon9ize/CantoneseLLM-6B-preview202402
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 30.71
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hon9kon9ize/CantoneseLLM-6B-preview202402
+      name: Open LLM Leaderboard
 ---
 # CantoneseLLM
 The model is intended to use for Cantonese language understanding and generation tasks, it may not be suitable for other Chinese languages. The model is trained on a diverse range of Cantonese text, including news, Wikipedia, and textbooks, it may not be suitable for informal or dialectal Cantonese, it may contain bias and misinformation, please use it with caution.
+We found the model is not well trained on the updated Hong Kong knowledge, it may due to the corpus is not large enough to brainwash the original model. We will continue to improve the model and corpus in the future.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_hon9kon9ize__CantoneseLLM-6B-preview202402)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |56.93|
+|AI2 Reasoning Challenge (25-Shot)|55.63|
+|HellaSwag (10-Shot)              |75.80|
+|MMLU (5-Shot)                    |63.07|
+|TruthfulQA (0-shot)              |42.26|
+|Winogrande (5-shot)              |74.11|
+|GSM8k (5-shot)                   |30.71|