lqtrung1998
commited on
Commit
•
7ee426b
1
Parent(s):
264ee16
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
+
# ReFT: Reasoning with REinforced Fine-Tuning
|
5 |
+
Paper: https://arxiv.org/pdf/2401.08967.pdf
|
6 |
+
|
7 |
+
Repo: https://github.com/lqtrung1998/mwp_ReFT (under [Apache2.0 License](https://github.com/lqtrung1998/mwp_ReFT/blob/main/License.txt))
|
8 |
+
|
9 |
+
## Introduction
|
10 |
+
We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning.
|
11 |
+
|
12 |
+
This repository contains:
|
13 |
+
- A Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-hf-SFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-hf-SFT-GSM8k)
|
14 |
+
- A Warmup Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-hf-SFT-warmup-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-hf-SFT-warmup-GSM8k)
|
15 |
+
- A REinforced Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-hf-ReFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-hf-ReFT-GSM8k)
|
16 |
+
- A Rerank model that can score the fine-tuned model output: [lqtrung1998/galactica-6.7b-hf-ReFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-hf-ReFT-Rerank-GSM8k)
|
17 |
+
|
18 |
+
Note: Our models are tuned based on Galactica, thus, licenses applicable to Galactica, such as non-commercial CC BY-NC 4.0 license also hold on these models.
|
19 |
+
|
20 |
+
## Training Data
|
21 |
+
The model is trained on GSM8k data with Python SDP CoT format, which can be found [here](https://github.com/lqtrung1998/mwp_ReFT)
|
22 |
+
|
23 |
+
## Training Procedure
|
24 |
+
Check out our paper and repo for complete details.
|
25 |
+
#### ReFT model
|
26 |
+
ReFT model is warm-up via Supervised Fine-tuning using GSM8k Python SDP training data for 2 epochs then it is REinforced Fine-tuned for 300 epochs using questions in GSM8k training set.
|
27 |
+
#### Rerank model
|
28 |
+
Rerank model is trained to classify if the output CoT is correct or not using sampling data of ReFT model after 2 epochs warm-up.
|
29 |
+
|
30 |
+
## Evaluation Results
|
31 |
+
See evaluations results of the models at table 4 of the research paper.
|
32 |
+
|
33 |
+
## Usage
|
34 |
+
You can use the models through Huggingface's Transformers library or follow scripts in our repo.
|
35 |
+
|
36 |
+
Prompt format:
|
37 |
+
```python
|
38 |
+
Question:
|
39 |
+
Weng earns $12 an hour for babysitting. Yesterday, she
|
40 |
+
just did 50 minutes of babysitting. How much did she earn?
|
41 |
+
Answer reasoning:
|
42 |
+
```
|
43 |
+
Expected response:
|
44 |
+
```python
|
45 |
+
def solution():
|
46 |
+
"""Weng earns $12 an hour for babysitting. Yesterday, she just did
|
47 |
+
50 minutes of babysitting. How much did she earn?"""
|
48 |
+
hourly_rate = 12
|
49 |
+
minutes_worked = 50
|
50 |
+
hours_worked = minutes_worked / 60
|
51 |
+
earnings = hourly_rate * hours_worked
|
52 |
+
result = earnings
|
53 |
+
return result
|
54 |
+
```
|
55 |
+
|
56 |
+
## Citation
|
57 |
+
Please cite the paper if you use our data, model or code.
|
58 |
+
```
|
59 |
+
@misc{luong2024reft,
|
60 |
+
title={ReFT: Reasoning with Reinforced Fine-Tuning},
|
61 |
+
author={Trung Quoc Luong and Xinbo Zhang and Zhanming Jie and Peng Sun and Xiaoran Jin and Hang Li},
|
62 |
+
year={2024},
|
63 |
+
eprint={2401.08967},
|
64 |
+
archivePrefix={arXiv},
|
65 |
+
primaryClass={cs.CL}
|
66 |
+
}
|
67 |
+
```
|