Files changed (1) hide show
  1. README.md +122 -6
README.md CHANGED
@@ -1,9 +1,12 @@
1
  ---
 
 
 
2
  license: other
3
- license_name: tongyi-qianwen-research
4
- license_link: https://huggingface.co/Qwen/Qwen1.5-0.5B/blob/main/LICENSE
5
  datasets:
6
  - Locutusque/UltraTextbooks-2.0
 
 
7
  inference:
8
  parameters:
9
  do_sample: true
@@ -12,9 +15,109 @@ inference:
12
  top_k: 40
13
  max_new_tokens: 250
14
  repetition_penalty: 1.1
15
- language:
16
- - en
17
- - zh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
  # tau-1.8B
@@ -52,4 +155,17 @@ The training of tau-1.8B required computational resources that contribute to the
52
  tau-1.8B was trained on a diverse dataset that may contain biases and inaccuracies. Users should be aware of these potential limitations and use the model responsibly. The model should not be used for tasks that could cause harm or discriminate against individuals or groups.
53
 
54
  ## Evaluation
55
- Coming Soon
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
  license: other
 
 
6
  datasets:
7
  - Locutusque/UltraTextbooks-2.0
8
+ license_name: tongyi-qianwen-research
9
+ license_link: https://huggingface.co/Qwen/Qwen1.5-0.5B/blob/main/LICENSE
10
  inference:
11
  parameters:
12
  do_sample: true
 
15
  top_k: 40
16
  max_new_tokens: 250
17
  repetition_penalty: 1.1
18
+ model-index:
19
+ - name: tau-1.8B
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: AI2 Reasoning Challenge (25-Shot)
26
+ type: ai2_arc
27
+ config: ARC-Challenge
28
+ split: test
29
+ args:
30
+ num_few_shot: 25
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 37.2
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/tau-1.8B
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: HellaSwag (10-Shot)
43
+ type: hellaswag
44
+ split: validation
45
+ args:
46
+ num_few_shot: 10
47
+ metrics:
48
+ - type: acc_norm
49
+ value: 60.26
50
+ name: normalized accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/tau-1.8B
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: MMLU (5-Shot)
59
+ type: cais/mmlu
60
+ config: all
61
+ split: test
62
+ args:
63
+ num_few_shot: 5
64
+ metrics:
65
+ - type: acc
66
+ value: 45.96
67
+ name: accuracy
68
+ source:
69
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/tau-1.8B
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: TruthfulQA (0-shot)
76
+ type: truthful_qa
77
+ config: multiple_choice
78
+ split: validation
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: mc2
83
+ value: 39.72
84
+ source:
85
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/tau-1.8B
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: Winogrande (5-shot)
92
+ type: winogrande
93
+ config: winogrande_xl
94
+ split: validation
95
+ args:
96
+ num_few_shot: 5
97
+ metrics:
98
+ - type: acc
99
+ value: 61.09
100
+ name: accuracy
101
+ source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/tau-1.8B
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: GSM8k (5-shot)
109
+ type: gsm8k
110
+ config: main
111
+ split: test
112
+ args:
113
+ num_few_shot: 5
114
+ metrics:
115
+ - type: acc
116
+ value: 30.17
117
+ name: accuracy
118
+ source:
119
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/tau-1.8B
120
+ name: Open LLM Leaderboard
121
  ---
122
 
123
  # tau-1.8B
 
155
  tau-1.8B was trained on a diverse dataset that may contain biases and inaccuracies. Users should be aware of these potential limitations and use the model responsibly. The model should not be used for tasks that could cause harm or discriminate against individuals or groups.
156
 
157
  ## Evaluation
158
+ Coming Soon
159
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
160
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_M4-ai__tau-1.8B)
161
+
162
+ | Metric |Value|
163
+ |---------------------------------|----:|
164
+ |Avg. |45.73|
165
+ |AI2 Reasoning Challenge (25-Shot)|37.20|
166
+ |HellaSwag (10-Shot) |60.26|
167
+ |MMLU (5-Shot) |45.96|
168
+ |TruthfulQA (0-shot) |39.72|
169
+ |Winogrande (5-shot) |61.09|
170
+ |GSM8k (5-shot) |30.17|
171
+