File size: 4,295 Bytes
0153eac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22dc2a5
 
 
0153eac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e4065f
0153eac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22dc2a5
0153eac
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
language: code
tags:
- code
- gpt2
- generation
datasets:
- giulio98/xlcost-single-prompt
widget:
- text: "'''\nfunction to add two numbers\n'''\n###\n"
  example_title: "add two numbers"
model-index:
- name: codegen-350M-multi-xlcost
  results:
  - task: 
      name: Code Generation
      type: code-generation
    dataset:
      name: "XLCost" 
      type: code_eval_outputs
    metrics:
       - name: pass@1
         type: code_eval_outputs
         value: 3.325
       - name: pass@10
         type: code_eval_outputs
         value: 15
       - name: codebleu
         type: codebleu
         value: 20.18191
---

# CodeGen-350M-multi-xlcost-v2

CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset using Deepspeed.

## Usage

You can load the CodeGen-350M-multi-xlcost-v2 model and tokenizer directly in `transformers`:

```Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2")
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2")

text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n"
input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```
Output:
```Python
'''
function to add two numbers 
'''
###
def add(a, b):
    return a + b
```
## Training

The model was finetuned on [XLCost-single-prompt](https://huggingface.co/datasets/giulio98/xlcost-single-prompt), an improved version of the original XLCost dataset [
xlcost-text-to-code](https://huggingface.co/datasets/codeparrot/xlcost-text-to-code). Below the hyperparameters.

| Hyperparameter | value |
|---------------------------|--------|
|Per device train batch size| 16 |
|Context size| 1024 |
|Training steps| 259|
|Gradient accumulation| 2|
|Gradient checkpointing| True|
|Learning rate|1.8e-05 |
|Weight decay | 0.1 |
|Warmup steps| 35 |
|Schedule| linear |
|zero stage| 2 |

Below the deepspeed configuration
```Python
{
  "fp16": {
    "enabled": true,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": 0.000018,
      "betas": [
        0.9,
        0.999
      ],
      "eps": 1e-8,
      "weight_decay": 0.1
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": 0,
      "warmup_max_lr": 0.000018,
      "warmup_num_steps": 35
    }
  },
  "zero_optimization": {
    "stage": 2,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": false
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 200000000,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 200000000,
    "contiguous_gradients": true
  },
  "gradient_accumulation_steps": 2,
  "train_batch_size": 32,
  "train_micro_batch_size_per_gpu": 16,
  "gradient_clipping": 1,
  "wall_clock_breakdown": false
}
```

The training was executed on 1 x V100 (16GB) GPU for 28min 50sec

## Performance

We evaluated the model on the first 400 samples of XLCOST's [XLCost-single-prompt test split](https://huggingface.co/datasets/giulio98/xlcost-single-prompt/viewer/Python/test) and comparing the outputs of the generated codes with respect to the expected output using pass@k metric.

| Metric | codegen-350M-multi-xlcost-v2 | codegen-350M-multi-xlcost | codegen-350M-mono(zero-shot) | codegen-350M-mono (one-shot) | codegen-350M-mono(few-shot)
|--------|-----|-----|-----|-----|-----|
|pass@1 |3.325% |3.70% | 0.4% | 0.35% | 0.48% |
|pass@10 |15%| 14.5% | 3.5% | 3 % | 3.75% |
|CodeBLEU |20.18%| None | 15.15% | 19.42 % | 20.27% |

The [pass@k metric](https://huggingface.co/metrics/code_eval) tells the probability that at least one out of k generations passes the tests. 

## Citations
```
@article{Nijkamp2022ACP,
  title={A Conversational Paradigm for Program Synthesis},
  author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint},
  year={2022}
}
```