zubingou commited on
Commit
717edbe
1 Parent(s): f352f70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -10
README.md CHANGED
@@ -45,16 +45,20 @@ Repo for "<a href="https://arxiv.org/pdf/2309.17452.pdf" target="_blank">ToRA: A
45
 
46
  ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical reasoning problems by interacting with tools, e.g., computation libraries and symbolic solvers. ToRA series seamlessly integrate natural language reasoning with the utilization of external tools, thereby amalgamating the analytical prowess of language and the computational efficiency of external tools.
47
 
48
- | Model | Size | GSM8k | MATH |
49
- |---|---|---|---|
50
- | [ToRA-7B](https://huggingface.co/llm-agents/tora-7b-v1.0) | 7B | 68.8 | 40.1 |
51
- | [ToRA-Code-7B](https://huggingface.co/llm-agents/tora-code-7b-v1.0) | 7B | 72.6 | 44.6 |
52
- | [ToRA-13B](https://huggingface.co/llm-agents/tora-13b-v1.0) | 13B | 72.7 | 43.0 |
53
- | [ToRA-Code-13B](https://huggingface.co/llm-agents/tora-code-13b-v1.0) | 13B | 75.8 | 48.1 |
54
- | [ToRA-Code-34B*](https://huggingface.co/llm-agents/tora-code-34b-v1.0) | 34B | 80.7 | **51.0** |
55
- | [ToRA-70B](https://huggingface.co/llm-agents/tora-70b-v1.0) | 70B | **84.3** | 49.7 |
56
-
57
- *ToRA-Code-34B is currently the first and only open-source model to achieve over 50% accuracy (pass@1) on the MATH dataset, which significantly outperforms GPT-4’s CoT result (51.0 vs. 42.5), and is competitive with GPT-4 solving problems with programs. By open-sourcing our codes and models, we hope more breakthroughs will come!
 
 
 
 
58
 
59
 
60
  ## ⚡️ Training
 
45
 
46
  ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical reasoning problems by interacting with tools, e.g., computation libraries and symbolic solvers. ToRA series seamlessly integrate natural language reasoning with the utilization of external tools, thereby amalgamating the analytical prowess of language and the computational efficiency of external tools.
47
 
48
+ | Model | Size | GSM8k | MATH | AVG@10 math tasks<sup>&dagger;</sup> |
49
+ |---|---|---|---|---|
50
+ | GPT-4 | - | 92.0 | 42.5 | 78.3 |
51
+ | GPT-4 (PAL) | - | 94.2 | 51.8 | 86.4 |
52
+ | [ToRA-7B](https://huggingface.co/llm-agents/tora-7b-v1.0) | 7B | 68.8 | 40.1 | 62.4|
53
+ | [ToRA-Code-7B](https://huggingface.co/llm-agents/tora-code-7b-v1.0) | 7B | 72.6 | 44.6 | 66.5|
54
+ | [ToRA-13B](https://huggingface.co/llm-agents/tora-13b-v1.0) | 13B | 72.7 | 43.0 | 65.9|
55
+ | [ToRA-Code-13B](https://huggingface.co/llm-agents/tora-code-13b-v1.0) | 13B | 75.8 | 48.1 | 71.3 |
56
+ | [ToRA-Code-34B<sup>*</sup>](https://huggingface.co/llm-agents/tora-code-34b-v1.0) | 34B | 80.7 | **51.0** | 74.8 |
57
+ | [ToRA-70B](https://huggingface.co/llm-agents/tora-70b-v1.0) | 70B | **84.3** | 49.7 | **76.9** |
58
+
59
+ - <sup>*</sup>ToRA-Code-34B is currently the first and only open-source model to achieve over 50% accuracy (pass@1) on the MATH dataset, which significantly outperforms GPT-4’s CoT result (51.0 vs. 42.5), and is competitive with GPT-4 solving problems with programs. By open-sourcing our codes and models, we hope more breakthroughs will come!
60
+
61
+ - <sup>&dagger;</sup>10 math tasks include GSM8k, MATH, GSM-Hard, SVAMP, TabMWP, ASDiv, SingleEQ, SingleOP, AddSub, and MultiArith.
62
 
63
 
64
  ## ⚡️ Training