arampacha commited on
Commit
ad81500
1 Parent(s): abc6e4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -6
README.md CHANGED
@@ -1,17 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # GPT-Code-Clippy-125M-APPS
2
 
3
  ## Model Description
4
 
5
- GPT-CC-125M-APPS is a GPT-Neo-1.3B finetuned on APPS dataset. This model is specialized to solve programming tasks.
6
 
7
  ## Training data
8
 
9
- [APPS dataset](https://github.com/hendrycks/apps).
 
 
10
 
11
  ## Training procedure
12
 
13
  The training script used to train this model can be found [here](https://github.com/ncoop57/gpt-code-clippy/blob/camera-ready/training/run_clm_apps.py).
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## Intended Use and Limitations
16
 
17
  The model is finetuned to solve programming problems given a text description and optional starter code.
@@ -24,9 +68,9 @@ You can use this model directly with a pipeline for text generation. This exampl
24
 
25
  from transformers import AutoModelForCausalLM, AutoTokenizer, FlaxAutoModelForCausalLM
26
 
27
- model = AutoModelForCausalLM.from_pretrained("flax-community/gpt-code-clippy-1.3B-apps-alldata")
28
 
29
- tokenizer = AutoTokenizer.from_pretrained("flax-community/gpt-code-clippy-1.3B-apps-alldata")
30
 
31
  prompt = """
32
 
@@ -54,8 +98,18 @@ print(tokenizer.decode(out[0][start:]))
54
 
55
  The model is intended to be used for research purposes and comes with no guarantees of quality of generated code.
56
 
57
- GPT-CC is finetuned GPT-Neo and might have inhereted biases and limitations from it. See [GPT-Neo model card](https://huggingface.co/EleutherAI/gpt-neo-1.3B#limitations-and-biases) for details.
 
 
 
 
 
 
 
 
 
 
58
 
59
  ## Eval results
60
 
61
- Coming soon...
 
1
+ ---
2
+ language:
3
+ - en
4
+ - python
5
+ license: MIT
6
+ tags:
7
+ - gpt_neo
8
+ - code_synthesis
9
+ datasets:
10
+ - apps
11
+
12
+ ---
13
+
14
  # GPT-Code-Clippy-125M-APPS
15
 
16
  ## Model Description
17
 
18
+ GPT-CC-125M-APPS is a GPT-Neo-125M finetuned on APPS dataset. This model is specialized to solve programming tasks.
19
 
20
  ## Training data
21
 
22
+ The model is trained on the [Automated Programming Progress Standard (APPS) dataset](https://github.com/hendrycks/apps). The dataset consists of 10,000 coding problems in total, with 131,836 test cases for checking solutions and 232,444 ground-truth solutions written by humans. Problems can be complicated, as the average length of a problem is 293.2 words. The data are split evenly into training and test sets, with 5,000 problems each.
23
+
24
+ This model is fine-tuned using most of the APPS dataset including both train and test split to explore the impact of this training task on model performance on other code synthesis evaluation metrics. A model fine-tuned on train set only can be found [here](https://huggingface.co/flax-community/gpt-neo-1.3B-apps).
25
 
26
  ## Training procedure
27
 
28
  The training script used to train this model can be found [here](https://github.com/ncoop57/gpt-code-clippy/blob/camera-ready/training/run_clm_apps.py).
29
 
30
+ Training is done for 5 epochs using AdamW optimizer and leaner decay learning rate schedule with 800 warmup steps. To reproduce the training one can use this command with the above script:
31
+
32
+ ```
33
+ python run_clm_apps.py \
34
+ --output_dir ./gpt-neo-125M-apps \
35
+ --model_name_or_path EleutherAI/gpt-neo-125B \
36
+ --dataset_name ./apps.py \
37
+ --dataset_config_name formatted \
38
+ --do_train --do_eval \
39
+ --block_size="1024" \
40
+ --per_device_train_batch_size="3" \
41
+ --per_device_eval_batch_size="3" \
42
+ --preprocessing_num_workers="16" \
43
+ --learning_rate="8e-5" \
44
+ --warmup_steps="800" \
45
+ --adam_beta1="0.9" \
46
+ --adam_beta2="0.98" \
47
+ --weight_decay="0.1" \
48
+ --overwrite_output_dir \
49
+ --num_train_epochs="5" \
50
+ --logging_steps="50" \
51
+ --eval_steps="2000" \
52
+ --report_to="wandb" \
53
+ --dtype="bfloat16" \
54
+ --save_strategy epoch \
55
+ --gradient_accumulation_steps 1 \
56
+ --all_data true \
57
+ ```
58
+
59
  ## Intended Use and Limitations
60
 
61
  The model is finetuned to solve programming problems given a text description and optional starter code.
 
68
 
69
  from transformers import AutoModelForCausalLM, AutoTokenizer, FlaxAutoModelForCausalLM
70
 
71
+ model = AutoModelForCausalLM.from_pretrained("flax-community/gpt-neo-1.3B-apps-all-2")
72
 
73
+ tokenizer = AutoTokenizer.from_pretrained("flax-community/gpt-neo-1.3B-apps-all-2")
74
 
75
  prompt = """
76
 
 
98
 
99
  The model is intended to be used for research purposes and comes with no guarantees of quality of generated code.
100
 
101
+ The paper ["Evaluating Large Language Models Trained on Code"](https://arxiv.org/abs/2107.03374) from OpenAI has a good discussion on what the impact of a large language model trained on code could be. Therefore, some parts of their discuss are highlighted here as it pertains to this dataset and models that may be trained from it. **As well as some differences in views from the paper, particularly around legal implications**.
102
+
103
+ 1. **Over-reliance:** This model may generate plausible solutions that may appear correct, but are not necessarily the correct solution. Not properly evaluating the generated code may cause have negative consequences such as the introduction of bugs, or the introduction of security vulnerabilities. Therefore, it is important that users are aware of the limitations and potential negative consequences of using this language model.
104
+
105
+ 2. **Economic and labor market impacts:** Large language models trained on large code datasets such as this one that are capable of generating high-quality code have the potential to automate part of the software development process. This may negatively impact software developers. However, as discussed in the paper, as shown in the Summary Report of software developers from [O*NET OnLine](https://www.onetonline.org/link/summary/15-1252.00), developers don't just write software.
106
+
107
+ 5. **Biases:** The model is trained on data containing prompt questions formatted in specific way. The performance of the model can be worse if the prompt
108
+
109
+ formatting is different from that used in APPS dataset.
110
+
111
+ GPT-CC is finetuned GPT-Neo and might have inhereted biases and limitations from it. See [GPT-Neo model card](https://huggingface.co/EleutherAI/gpt-neo-125M#limitations-and-biases) for details.
112
 
113
  ## Eval results
114
 
115
+ Coming soon...