Dongwei
/

Rationalyst_reasoning_datasets

Text Generation

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Dongwei commited on Oct 3, 2024

Commit

18fb372

•

1 Parent(s): f0098eb

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -4,10 +4,10 @@ license: apache-2.0
-# Rationalyst (with rationales extracted from reasoning datasets)
 This model is a fine-tuned version of the [LLaMa-3-Instruct-8B](https://huggingface.co/bert-base-uncased). It was
-introduced in RATIONALYST: Supervising Reasoning via Self-Supervised Rationale Extraction. The code for the rationale extraction, model training and
 inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model).
 ## Model description
@@ -20,8 +20,7 @@ To use it, simply input question and partial reasoning trajectory, and the model
 ## Training data
-This Rationalyst is trained using 17566 rationales from GSM8K and 19669 rationales from ECQA. The data used can be found
-[here](https://huggingface.co/datasets/Dongwei/reasoning_world_model)
 ## Evaluation results
@@ -30,4 +29,4 @@ When used to evaluate on downstream tasks, this model achieves the following res
 | Task | GSM8K | MATH  | ECQA | HellaSwag | ProofWriter  | ARC | MMLU-Pro |
 |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|
-|      | 80.3 | 31.4   | 74.5 | 59.1     | 88.2 | 78.8        | 41.2     |

+# Rationalyst
 This model is a fine-tuned version of the [LLaMa-3-Instruct-8B](https://huggingface.co/bert-base-uncased). It was
+introduced in [RATIONALYST: Pre-training Process-Supervision for Improving Reasoning](https://arxiv.org/pdf/2410.01044). The code for the rationale extraction, model training, and
 inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model).
 ## Model description
 ## Training data
+This Rationalyst is trained using 65k implicit rationales from The Pile and 14k implicit rationales from GSM8K and ECQA. The data used can be found [here](https://huggingface.co/datasets/Dongwei/reasoning_world_model)
 ## Evaluation results
 | Task | GSM8K | MATH  | ECQA | HellaSwag | ProofWriter  | ARC | MMLU-Pro |
 |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|
+|      | 81.6 | 32.5   | 75.2 | 60.3     | 90.7 | 80.7        | 45.3     |