YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
vllm (pretrained=/root/autodl-tmp/Qwen3-32B-abliterated,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,tensor_parallel_size=4,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.900 | ± | 0.0190 |
| strict-match | 5 | exact_match | ↑ | 0.896 | ± | 0.0193 |
vllm (pretrained=/root/autodl-tmp/Qwen3-32B-abliterated,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,tensor_parallel_size=4,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.852 | ± | 0.0159 |
| strict-match | 5 | exact_match | ↑ | 0.840 | ± | 0.0164 |
| Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mmlu | 2 | none | acc | ↑ | 0.7988 | ± | 0.0131 | |
| - humanities | 2 | none | acc | ↑ | 0.7897 | ± | 0.0269 | |
| - other | 2 | none | acc | ↑ | 0.7590 | ± | 0.0298 | |
| - social sciences | 2 | none | acc | ↑ | 0.8722 | ± | 0.0252 | |
| - stem | 2 | none | acc | ↑ | 0.7860 | ± | 0.0230 |
vllm (pretrained=/root/autodl-tmp/Qwen3-32B-abliterated-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.848 | ± | 0.0228 |
| strict-match | 5 | exact_match | ↑ | 0.844 | ± | 0.0230 |
vllm (pretrained=/root/autodl-tmp/Qwen3-32B-abliterated-128-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.850 | ± | 0.0160 |
| strict-match | 5 | exact_match | ↑ | 0.834 | ± | 0.0167 |
| Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mmlu | 2 | none | acc | ↑ | 0.7860 | ± | 0.0133 | |
| - humanities | 2 | none | acc | ↑ | 0.7795 | ± | 0.0272 | |
| - other | 2 | none | acc | ↑ | 0.7538 | ± | 0.0295 | |
| - social sciences | 2 | none | acc | ↑ | 0.8389 | ± | 0.0271 | |
| - stem | 2 | none | acc | ↑ | 0.7789 | ± | 0.0232 |
- Downloads last month
- 1