zhuxunyu commited on
Commit
5d005ec
1 Parent(s): da82b05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md CHANGED
@@ -1,3 +1,120 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - gsm8k
5
+ metrics:
6
+ - accuracy
7
  ---
8
+
9
+ # Model Card for Model ID
10
+
11
+ <!-- Provide a quick summary of what the model is/does. -->
12
+
13
+ We distill math reasoning ability from large language model gpt-3.5-turbo to the open code small language model [Salesforce/codet5p-770m-py](https://huggingface.co/Salesforce/codet5p-770m-py), and math-codet5p-770m-py achieves 44.88% accuracy on GSM8K testing dataset.
14
+
15
+
16
+ ### Model Description
17
+
18
+ <!-- Provide a longer summary of what this model is. -->
19
+
20
+
21
+
22
+ - **Developed by:** Xunyu Zhu
23
+ - **Model type:** encoder-decoder
24
+ - **Language(s) (NLP):** python
25
+ - **License:** apache-2.0
26
+ - **Finetuned from model:** [Salesforce/codet5p-770m-py](https://huggingface.co/Salesforce/codet5p-770m-py)
27
+
28
+
29
+
30
+ ## Uses
31
+
32
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
33
+
34
+ ### Direct Use
35
+
36
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
37
+ This model can be easily loaded using the AutoModelForSeq2SeqLM functionality and employs the same tokenizer as original [Salesforce/codet5p-770m-py](https://huggingface.co/Salesforce/codet5p-770m-py).
38
+ When given a question, the prompt "\nProgram: Let’s design executable python program (return ans) to solve the question." is needed to add as the input to instruct the model to generate reasoning results.
39
+
40
+ ```python
41
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
42
+
43
+ def safe_execute(code_string: str, keys=None):
44
+ def execute(x):
45
+ try:
46
+ exec(x)
47
+ locals_ = locals()
48
+ if keys is None:
49
+ return locals_.get('ans', None)
50
+ else:
51
+ return [locals_.get(k, None) for k in keys]
52
+ except Exception:
53
+ return None
54
+ try:
55
+ ans = func_timeout.func_timeout(5, execute, args=(code_string,))
56
+ except func_timeout.FunctionTimedOut:
57
+ ans = None
58
+ return ans
59
+
60
+ checkpoint = "zhuxunyu/math-codet5p-770m-py"
61
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
62
+
63
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
64
+ model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)
65
+
66
+ question = "Question: Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\nProgram: Let’s design executable python program (return ans) to solve the question.".
67
+ input = tokenizer(question, max_length=256, padding="max_length", truncation=True, return_tensors="pt").to(model.device)
68
+
69
+ with torch.no_grad():
70
+ output = model.generate(**input, max_length=256)
71
+
72
+ generation = tokenizer.decode(output, skip_special_tokens=True)
73
+ ans = safe_execute(generation)
74
+ print(float(ans))
75
+ ```
76
+
77
+
78
+
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ We prompt gpt-3.5-turbo to generate reasoning programs to solve questions in GSM8K training dataset, and each question includes 4 reasoning programs. Then, questions in GSM8K training dataset and
87
+ their corresponding reasoning programs are built as a training dataset, and we use the training dataset to fine-tune the LM.
88
+
89
+ ## Evaluation
90
+
91
+ <!-- This section describes the evaluation protocols and provides the results. -->
92
+
93
+
94
+
95
+ ### Testing Data
96
+
97
+ <!-- This should link to a Dataset Card if possible. -->
98
+
99
+ The testing data is GSM8K testing dataset.
100
+
101
+
102
+ ### Results
103
+
104
+ math-codet5p-770m-py achieves 44.88% accuracy on GSM8K testing dataset.
105
+
106
+
107
+
108
+ ## Citation
109
+
110
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
111
+
112
+ **BibTeX:**
113
+
114
+ ```
115
+ @misc{zhu2023mathcodet5plus,
116
+ title={math-codet5p-770m-py},
117
+ author={Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang},
118
+ year={2023}
119
+ }
120
+ ```