Dongwei commited on
Commit
18fb372
1 Parent(s): f0098eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -4,10 +4,10 @@ license: apache-2.0
4
 
5
 
6
 
7
- # Rationalyst (with rationales extracted from reasoning datasets)
8
 
9
  This model is a fine-tuned version of the [LLaMa-3-Instruct-8B](https://huggingface.co/bert-base-uncased). It was
10
- introduced in RATIONALYST: Supervising Reasoning via Self-Supervised Rationale Extraction. The code for the rationale extraction, model training and
11
  inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model).
12
 
13
  ## Model description
@@ -20,8 +20,7 @@ To use it, simply input question and partial reasoning trajectory, and the model
20
 
21
  ## Training data
22
 
23
- This Rationalyst is trained using 17566 rationales from GSM8K and 19669 rationales from ECQA. The data used can be found
24
- [here](https://huggingface.co/datasets/Dongwei/reasoning_world_model)
25
 
26
 
27
  ## Evaluation results
@@ -30,4 +29,4 @@ When used to evaluate on downstream tasks, this model achieves the following res
30
 
31
  | Task | GSM8K | MATH | ECQA | HellaSwag | ProofWriter | ARC | MMLU-Pro |
32
  |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|
33
- | | 80.3 | 31.4 | 74.5 | 59.1 | 88.2 | 78.8 | 41.2 |
 
4
 
5
 
6
 
7
+ # Rationalyst
8
 
9
  This model is a fine-tuned version of the [LLaMa-3-Instruct-8B](https://huggingface.co/bert-base-uncased). It was
10
+ introduced in [RATIONALYST: Pre-training Process-Supervision for Improving Reasoning](https://arxiv.org/pdf/2410.01044). The code for the rationale extraction, model training, and
11
  inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model).
12
 
13
  ## Model description
 
20
 
21
  ## Training data
22
 
23
+ This Rationalyst is trained using 65k implicit rationales from The Pile and 14k implicit rationales from GSM8K and ECQA. The data used can be found [here](https://huggingface.co/datasets/Dongwei/reasoning_world_model)
 
24
 
25
 
26
  ## Evaluation results
 
29
 
30
  | Task | GSM8K | MATH | ECQA | HellaSwag | ProofWriter | ARC | MMLU-Pro |
31
  |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|
32
+ | | 81.6 | 32.5 | 75.2 | 60.3 | 90.7 | 80.7 | 45.3 |