yuexiang96 commited on
Commit
71e925d
1 Parent(s): 229d28b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -11
README.md CHANGED
@@ -37,17 +37,26 @@ The models are fine-tuned with the MathInstruct dataset using the original Llama
37
  The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
38
 
39
 
40
- | Model | Size | Base | GSM8K | MATH | AQuA | NumGLUE | IID Avg | SVAMP | Mathematics | SimulEq | SAT-Math | MMLU-Math | OOD Avg |
41
- |-------------------|-------|---------------|-----------|-------|-------|-----------|---------------|-----------|---------------|-----------|-----------|---------------|---------------|
42
- | | | | | | | | | | | | | | |
43
- | MAmmoTH | 7B | Llama-2 | 51.7 | 31.2 | 42.9 | 53.1 | 44.7 | 66.7 | 44.8 | 42 | 36.4 | 38.6 | 45.7 |
44
- | MAmmoTH-Coder | 7B | Code-Llama | 58.8 | 35.2 | 43 | 57.1 | 48.5 | 71.1 | 53.9 | 44.6 | 40 | 40.5 | 50.2 |
45
- | MAmmoTH | 13B | Llama-2 | 61.7 | 36 | 44.8 | 59.6 | 50.5 | 72.4 | 48.7 | 40.5 | 42.7 | 45.3 | 49.9 |
46
- | MAmmoTH-Coder | 13B | Code-Llama | 64.3 | 38.6 | 46.1 | 54.2 | 50.8 | 73.2 | 60 | 44.1 | 40.9 | 45.2 | 52.6 |
47
- | MAmmoTH-Coder | 34B | Code-Llama | 72.3 | 46.8 | 50.8 | 59.6 | 57.3 | 84 | 64.7 | 50.6 | 51.8 | 50.2 | 60.3 |
48
- | MAmmoTH | 70B | Llama-2 | 76.7 | 44.2 | 61.4 | 64.3 | 61.7 | 81.7 | 55.3 | 45.3 | 58.6 | 52.3 | 58.6 |
49
-
50
-
 
 
 
 
 
 
 
 
 
51
 
52
  ## Usage
53
  You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.
 
37
  The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
38
 
39
 
40
+ | **Model** | **Decoding** | **GSM** | **MATH** | **AQuA** | **NumG** | **SVA** | **Mat** | **Sim** | **SAT** | **MMLU** | **AVG** |
41
+ |-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
42
+ | **MAmmoTH-7B** | CoT | 50.5 | 10.4 | 43.7 | 44.0 | 47.3 | 9.2 | 18.9 | 32.7 | 39.9 | 33.0 |
43
+ | | PoT | 51.6 | 28.7 | 43.3 | 52.3 | 65.1 | 41.9 | 48.2 | 39.1 | 44.6 | 46.1 |
44
+ | | **Hybrid** | **53.6** | **31.5** | **44.5** | **61.2** | **67.7** | **46.3** | **41.2** | **42.7** | **42.6** | **47.9** |
45
+ | **MAmmoTH-Coder-7B** | CoT | 22.4 | 7.9 | 36.2 | 36.0 | 37.0 | 8.2 | 7.2 | 32.7 | 34.6 | 24.7 |
46
+ | | PoT | 58.8 | 32.1 | 47.2 | 57.1 | 71.1 | 53.9 | 44.6 | 40.0 | 47.8 | 50.3 |
47
+ | | **Hybrid** | **59.4** | **33.4** | **47.2** | **66.4** | **71.4** | **55.4** | **45.9** | **40.5** | **48.3** | **52.0** |
48
+ | **MAmmoTH-13B** | CoT | 56.3 | 12.9 | 45.3 | 45.6 | 53.8 | 11.7 | 22.4 | 43.6 | 42.3 | 37.1 |
49
+ | | PoT | 61.3 | 32.6 | 48.8 | 59.6 | 72.2 | 48.5 | 40.3 | 46.8 | 45.4 | 50.6 |
50
+ | | **Hybrid** | **62.0** | **34.2** | **51.6** | **68.7** | **72.4** | **49.2** | **43.2** | **46.8** | **47.6** | **52.9** |
51
+ | **MAmmoTH-Coder-13B** | CoT | 32.1 | 10.2 | 40.6 | 36.2 | 43.0 | 9.6 | 10.1 | 40.9 | 36.6 | 28.8 |
52
+ | | PoT | 64.3 | 35.2 | 46.8 | 54.2 | 73.2 | 60.0 | 44.2 | 48.2 | 48.2 | 52.7 |
53
+ | | **Hybrid** | **64.7** | **36.3** | **46.9** | **66.8** | **73.7** | **61.5** | **47.1** | **48.6** | **48.3** | **54.9** |
54
+ | **MAmmoTH-Coder-33B** | CoT | 34.3 | 11.6 | 39.0 | 36.2 | 44.6 | 10.8 | 10.9 | 46.4 | 42.9 | 30.7 |
55
+ | | PoT | 72.3 | 42.8 | 53.8 | 59.6 | 84.0 | 64.7 | 50.6 | 58.6 | 52.7 | 59.9 |
56
+ | | **Hybrid** | **72.7** | **43.6** | **54.7** | **71.6** | **84.3** | **65.4** | **51.8** | **60.9** | **53.8** | **62.1** |
57
+ | **MAmmoTH-70B** | CoT | 72.4 | 21.1 | 57.9 | 58.9 | 71.6 | 20.0 | 31.9 | 57.3 | 52.1 | 49.2 |
58
+ | | PoT | 76.7 | 40.1 | 60.2 | 64.3 | 81.7 | 55.3 | 45.3 | 64.1 | 53.5 | 60.1 |
59
+ | | **Hybrid** | **76.9** | **41.8** | **65.0** | **74.4** | **82.4** | **55.6** | **51.4** | **66.4** | **56.7** | **63.4** |
60
 
61
  ## Usage
62
  You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.