File size: 16,580 Bytes
1d84adc
 
 
 
 
 
 
 
 
 
 
ff842a9
 
 
1d84adc
ff842a9
1d84adc
 
 
 
 
 
ff842a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d84adc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff842a9
 
 
 
 
 
 
 
1d84adc
 
 
 
 
 
 
 
ff842a9
 
 
 
 
61d727f
 
 
e3e19d3
ff842a9
61d727f
ff842a9
 
 
 
 
1d84adc
52cf8ac
61d727f
e3e19d3
ff842a9
61d727f
ff842a9
 
1d84adc
 
 
 
 
 
 
 
 
 
 
ff842a9
 
 
 
 
 
 
 
 
 
 
1d84adc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff842a9
 
 
 
 
 
 
f98d551
 
2d001b6
ff842a9
 
 
 
 
 
 
 
f98d551
 
35a3aa3
ff842a9
 
 
1d84adc
 
 
 
 
 
 
 
 
b6cf4c7
1d84adc
 
 
 
 
 
 
 
 
fa1a2f3
 
 
 
 
 
1d84adc
 
 
 
fa1a2f3
 
 
 
 
 
1d84adc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff842a9
 
 
 
 
 
 
 
 
 
1d84adc
 
 
 
 
 
 
 
 
 
0fcb857
 
dbace8e
9af7a93
c665586
 
 
 
4bd8296
 
0fcb857
 
1930271
 
 
 
fb88b71
4bd8296
 
ff842a9
1d84adc
 
 
 
 
 
 
 
 
 
ff842a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d84adc
 
 
 
 
 
 
 
 
 
 
0fcb857
1930271
 
 
4bd8296
b4970e0
5688d3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
---
pretty_name: "ComBack"
language: 
  - code
tags:
  - C++/C Code
  - Compiler Backend
license: "cc-by-4.0"
---


# ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency

ComBack is a large-scale multi-platform compiler backend code dataset.
This repository contains all fine-tuned models and scripts for reproducing experimental results.


## Dataset Information

Details can be found at https://huggingface.co/datasets/docz-ict/ComBack

## Task Example

  - Statement-Level Completion: complete current statement.
  ```c++
  //Inputs:
  ...
  adjustReg(MBB,LastFrameDestroy, DL, SPReg, FPReg, -StackSize+RVFI->getVarArgsSaveSize() 
  //Ground Truth:
  MachineInstr::FrameDestroy);
  ```

  - Next-Statement Suggestion: predict the next statement.

   ```c++
  //Inputs:
  ...
  maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask;
  //Ground Truth:
  MFI -> setMaxCallFrameSize(maxCallFrameSize);
  ```


  - Code Generation: generate a function with function description in natrual language.

   ```c++
  //Inputs:
  getPointerRegClass: Returns a TargetRegisterClass used for pointer values.
  Target-Specific Value: Sparc, SP::I64RegsRegClass, SP::IntRegsRegClass.
  //Ground Truth:
  TargetRegisterClass *SparcRegisterInfo::getPointerRegClass(MachineFunction &MF ,unsigned Kind) {
      return Subtarget.is64Bit() ? &SP::I64RegsRegClass : &SP::IntRegsRegClass;
  }
  ```



## 1. Dependency

- python version == 3.8.1
- pip install -r requirements.txt

## 2. Fine-Tuning
We fine-tuned six pre-trained code language models on 8 Tesla V100 each with 16GB.
You can fine-tune each model on our datasets by running:

```shell
# Model Type Options: CodeBert, GraphCodeBert, UnixCoder, CodeT5, NatGen, CodeT5+
# Task Options: code-generation, code-completion, new-target-completion(Only for CodeT5+), new-target-generation(Only for CodeT5+)
bash ./Script/Model/{Model Type}/{Task}/run_fine_tuning*.sh
```



## 3. Reproducing Results in Table.2 

### Dataset split scheme
Split data of all 178 backends into train/valid/test set in the ratio of 80%:10%:10%

  - Dataset Info
  
  | Task | Train | Valid | Test |
  | ---- | ---- | ---- | ---- |
  | Statement-Level Comp. | 128,899(11.36M Token) | 16,112(1.43M Token)  | 16,113(1.43M Token) |
  | Next-Statement Sugg. | 173,052(15.69M Token) | 21,631(1.99M Token)   | 21,632(1.98M Token) |
  | Code Generation. | 36,236(5.10M Token) | 4,530(0.64M Token)  | 4,530(0.64M Token) |

### Reproducing results in Table.2 by running:

```shell
# Model Type Options: CodeBert, GraphCodeBert, UnixCoder, CodeT5, NatGen, CodeT5+
# Task Options: code-generation, code-completion
bash ./Script/Model/{Model Type}/{Task}/run_test.sh
```
### Results

  - Without Fine-Tuning
    |               | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. |
    |-------------|:-----------------:|:-----------------:|:----------------:|:----------------:|:----------:|:----------:|
    |    **Model**    |         EM        |         ED        |        EM        |        ED        |    BLEU4   |     ED     |
    | CodeBert-c      |        0.00       |        0.97       |       0.00       |       1.31       |     0.00    |     0.44    |
    | GraphCodeBert-c |        0.00       |        0.35       |       0.00       |       0.54       |     0.00    |     2.41    |
    | UnixCoder-base-nine     |        0.07       |       27.56       |        15.93       |        29.11       |    0.00    |    31.81   |
    | CodeT5-base        |        0.65       |       21.45       |        7.23       |        23.50       |     0.00    |     13.57    |
    | NatGen        |        0.00       |       13.52       |       0.02       |       15.95      |     0.01    |     28.76    |
    | CodeT5+-220m       |        0.02       |        7.24       |       0.12       |       9.87       |    0.00    |    12.33   |

  - Fine-Tuned
    |               | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. |
    |-------------|:-----------------:|:-----------------:|:----------------:|:----------------:|:----------:|:----------:|
    | **Model**         |         EM        |         ED        |        EM        |        ED        |    BLEU4   |     ED     |
    | CodeBert-c      |       53.84       |       77.44       |       52.67      |       70.82      |     23.54    |     54.63    |
    | GraphCodeBert-c |       43.00       |       71.89       |       47.10      |       61.31      |     20.73    |     48.83    |
    | UnixCoder-base-nine     |     **67.84**     |     **85.06**     |        58.51       |        75.31       |    56.24   |    73.45   |
    | CodeT5-base        |       66.38       |       84.34       |        58.52       |        76.03       |     70.87    |     80.45    |
    | NatGen        |       67.47       |       84.83       |     **60.30**    |     **76.84**    |     71.73    |     81.39    |
    | CodeT5+-220m       |       66.93       |       84.45       |       59.57      |       76.41      |  **75.28** |  **82.95** |




## 4. Reproducing Results in Table.3 





### Dataset split scheme

Take data of RISC-V,ARC,NVPTX both in GCC and LLVM as test set, split train/valid set in the ratio of 85%:15% of other CPU, MPU and GPU targets excluding RI5CY(RI5CY is custmoized based on RISCV)


  - Datset Info


    | Task | Train | Valid | Test |
    | ---- | ---- | ---- | ---- |
    | Statement-Level Comp. | 114,016(10.20M Token) | 20,121(1.81M Token)  | 6,645(0.58M Token) |
    | Next-Statement Sugg. | 152,114(14.10M Token) | 26,844(2.49M Token)   | 9,313(0.83M Token) |
    | Code Generation. | 30,633(4.44M Token) | 5,406(0.79M Token)  | 2,819(0.37M Token) |



### Input examples for ChatGPT-3.5-Turbo and Code-LLaMA-34B-Instruct
**Statement-Level Completion**
```cpp
//Prompt: Complete the last statement of this code snippet:
...
adjustReg(MBB,LastFrameDestroy, DL, SPReg, FPReg, -StackSize+RVFI->getVarArgsSaveSize() 
```
**Next-Statement Suggestion**
```cpp
//Prompt: Predict the next statement of this code snippet:
...
maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask;
```
**Code Generation**
```cpp
//Prompt: Create a function named "getPointerRegClass" for "Sparc" backend of LLVM Compiler. 
//The description of this function is "Returns a TargetRegisterClass used for pointer values". 
//It contains "Sparc", "SP::I64RegsRegClass", "SP::IntRegsRegClass" as target specific values.
```


### Reproducing results in Table.3 by running:
```shell
# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_existing_type.sh

# ChatGPT
bash ./Script/Exp_Script/ChatGPT/run_chatgpt.sh

# Code-LLaMA
bash ./Script/Exp_Script/ChatGPT/run_codellama.sh
```



### Results


  - GCC

  |            | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. |
  |----------|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
  |            |    RISC-V   |    RISC-V   |     ARC     |     ARC     |    NVPTX    |    NVPTX    |    RISC-V   |    RISC-V   |     ARC     |     ARC     |    NVPTX    |    NVPTX    |   RISC-V   |   RISC-V   |     ARC    |     ARC    |    NVPTX   |    NVPTX   |
  | Model      |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |     BLEU4     |     ED     |     BLEU4     |     ED     |     BLEU4     |     ED     |
  | ChatGPT-3.5-Turbo    |    10.34    |    38.41    |    15.35    |    42.94    |    12.01    |    41.47    |     6.44    |     12.9    |     9.75    |    20.79    |     7.97    |    17.79    |   1.37  | 24.12 | 1.67  | 28.26 | 1.57 | 26.97 |
  | Code-LLaMA-34B |     0.41    |    19.07    |     0.85    |    16.77    |     0.56    |    18.22    |     1.58    |    13.54    |     2.66    |    17.95    |     2.47    |    16.59    |    1.67  | 27.89  | 1.71  | 30.49  | 1.57 |  27.65 |
  | CodeT5+-220m    |    **51.16**    |    **75.32**    |    **52.45**    |    **74.57**    |    **50.56**    |    **75.52**    |     **49.11**     |     **67.84**     |     **38.26**     |     **59.21**     |     **38.33**     |     **56.31**     |     **32.56**    |     **58.67**    |     **19.94**    |     **50.27**    |     **25.47**    |     **52.60**    |


  - LLVM

  |            | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. |
  |----------|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
  |            |    RISC-V   |    RISC-V   |     ARC     |     ARC     |    NVPTX    |    NVPTX    |    RISC-V   |    RISC-V   |     ARC     |     ARC     |    NVPTX    |    NVPTX    |   RISC-V   |   RISC-V   |     ARC    |     ARC    |    NVPTX   |    NVPTX   |
  | Model      |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |      EM     |      ED     |     BLEU4     |     ED     |     BLEU4     |     ED     |     BLEU4     |     ED     |
  | ChatGPT-3.5-Turbo    |     12.08     |     41.39     |     16.77     |     42.02     |     14.73     |     43.72     |     9.80     |    21.86    |    10.81    |    20.66    |    11.39    |    22.82    |    1.23 | 25.12 | 1.30 | 27.19 | 1.43 | 25.45 |
  | Code-LLaMA-34B |     0.45     |    17.61    |     0.61    |    17.21    |     0.99    |    17.23    |     1.75    |    15.04    |     0.42    |    11.27    |     2.42    |    16.25    |   1.43 | 27.24 | 1.61 | 32.12 | 1.59 | 28.08  |
  | CodeT5+-220m    |     **62.68**     |     **82.02**     |     **71.34**     |     **85.98**     |     **64.45**     |     **81.53**     |     **48.71**     |     **68.95**     |     **58.68**     |     **74.57**     |     **47.81**     |     **65.5**     |     **50.34**    |     **72.98**    |     **55.38**    |     **74.41**    |     **44.33**    |     **66.36**    |





## 5. Reproducing Results in Figure.6 

### Reproducing results in Table.4 by running:
```shell
# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_existing_type.sh

# Fork-Flow
bash ./Script/Exp_Script/ForkFlow/run_forkflow.sh
```


### Results


  - GCC

  |  | RISCV | RISCV | ARC | ARC | NVPTX | NVPTX |
  |--------------  |-------  |-------  |-------  |------  |-------  |-------  |
  | Method | BLEU4 | ED | BLEU4 | ED | BLEU4 | ED |
  | ForkFlow Avg | 3.48 | 5.79 | 1.77 | 3.73 | 4.7 | 3.81 |
  | ForkFlow Max | 28.77 | 34.8 | 4.94 | 8.85 | 4.7 | 3.81 |
  | CodeT5+ | 32.56 | 58.67 | 25.47 | 52.6 | 19.94 | 50.27 |


  - LLVM

  |  | RISCV | RISCV | ARC | ARC | NVPTX | NVPTX |
  |--------------  |-------  |-------  |-------  |-------  |-------  |-------  |
  | Method | BLEU4 | ED | BLEU4 | ED | BLEU4 | ED |
  | ForkFlow Avg | 12.45 | 22.18 | 19.98 | 33.43 | 15.06 | 28.73 |
  | ForkFlow Max | 27.32 | 46.47 | 41.8 | 60.62 | 18.81 | 39.04 |
  | CodeT5+ | 50.34 | 72.98 | 55.38 | 74.41 | 44.33 | 66.36 |















## 6. Reproducing Results in Table.4



### Dataset split scheme


Take data of ARC,NVPTX both in GCC and LLVM as test set, split train/valid set in the ratio of 85%:15% of CPU targets excluding RISC-V and RI5CY

  - Datset Info


    | Task | Train | Valid | Test |
    | ---- | ---- | ---- | ---- |
    | Statement-Level Comp. | 87,018(7.78M Token) | 15,357(1.37M Token)  | 2,764(0.26M Token) |
    | Next-Statement Sugg. | 113,684(10.65M Token) | 20,063(1.87M Token)   | 4,029(0.38M Token) |
    | Code Generation. | 21,184(3.14M Token) | 3,739(0.55M Token)  | 1,372(0.18M Token) |


### Reproducing results in Table.4 by running:
```shell
# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_new_type.sh
```



### Results

  - GCC

    |  | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. |
    |:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------:  |:----------:  |:----------:  |:----------:  |
    |  | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) |
    | Dataset | EM | ED | EM | ED | EM | ED | EM | ED | BLEU4 | ED | BLEU4 | ED |
    | -w GPU and MPU | 52.45 | 74.57 | 50.56 | 75.52 | 38.26 | 59.21 | 38.33 | 56.31 | 19.94 | 50.27 | 25.47 | 52.6 |
    | -w/o GPU and MPU | 50.53| 74.09 | 46.37 | 72.45 | 37.22 | 58.21 | 38.33 | 56.83 | 19.29 | 49.12 | 22.46 | 50.33 |
    |  **Decrease**  |  **1.92**  |  **0.48**  |  **4.19**  |  **3.07**  |  **1.04**  |  **1.00**  |  **0.00**  |  **-0.52**  |  **0.65**  |  **1.15**  |  **3.01**  |  **3.37**  |

  - LLVM
    |  | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. | Code. Gen. | Code. Gen. |
    |------------------  |:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------:  |:----------:  |:----------:  |:----------:  |
    |  | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) | ARC(MPU) | ARC(MPU) | NVPTX(GPU) | NVPTX(GPU) |
    | Dataset | EM | ED | EM | ED | EM | ED | EM | ED | BLEU4 | ED | BLEU4 | ED |
    | -w GPU and MPU | 71.34 | 85.98 | 64.45 | 81.53 | 58.68 | 74.57 | 47.81 | 65.50 | 55.38 | 74.41 | 44.33 | 66.36 |
    | -w/o GPU and MPU | 69.82 | 85.59 | 60.04 | 79.85 | 58.26 | 73.75 | 46.28 | 63.92 | 49.62 | 70.26 | 42.94 | 65.43 |
    |  **Decrease**  |  **1.52**  |  **0.39**  |  **4.41**  |  **1.68**  |  **0.42**  |  **0.82**  |  **1.53**  |  **1.58**  |  **5.76**  |  **4.15**  |  **1.39**  |  **0.93**  |



## 7. Reproducing Results in Table.5



### Dataset split scheme


Take data of RI5CY in LLVM as test set, split train/valid set in the ratio of 85%:15% of CPU targets excluding RISC-V and including RISC-V

  - Datset Info
    - Excluding RISC-V

    | Task | Train | Valid | Test |
    | ---- | ---- | ---- | ---- |
    | Statement-Level Comp. | 87,018(7.78M Token) | 15,357(1.37M Token)  | 721(0.04M Token) |
    | Next-Statement Sugg. | 113,684(10.65M Token) | 20,063(1.87M Token)   | 1,035(0.06M Token) |
    | Code Generation. | 21,184(3.14M Token) | 3,739(0.55M Token)  | 219(0.02M Token) |

    - Including RISC-V
  
    | Task | Train | Valid | Test |
    | ---- | ---- | ---- | ---- |
    | Statement-Level Comp. | 90,316(8.06M Token) | 15,940(1.42M Token)  | 721(0.04M Token) |
    | Next-Statement Sugg. | 118,175(11.04M Token) | 20,856(1.94M Token)   | 1,035(0.06M Token) |
    | Code Generation. | 22,413(3.30M Token) | 3,957(0.58M Token)  | 219(0.02M Token) |



### Reproducing results in Table.5 by running:
```shell
# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_itr_exp.sh
```



### Results

  | | Stmt. Comp. | Stmt. Comp. | Next. Sugg. | Next. Sugg. | Code. Gen. | Code. Gen. |
  |:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:----------: |:----------: |
  | Dataset | EM | ED | EM | ED | BLEU4 | ED |
  | -w/o RISC-V | 66.16 | 83.79 | 57.29 | 74.73 | 54.41 | 75.41 |
  | -w RISC-V | 74.06 | 87.91 | 67.25 | 81.28 | 79.46 | 89.92 |
  | **Diff** | **7.90** | **4.12** | **9.96** | **6.55** | **25.05** | **14.51** |





## Citation
```
@inproceedings{zhong2024comback,
  title={ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency},
  author={Ming Zhong, Fang Lyu, Lulin Wang, Hongna Geng, Lei Qiu, Huimin Cui, Xiaobing Feng},
  booktitle={Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}
```