File size: 1,784 Bytes
037b05a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: apache-2.0
datasets:
- AISE-TUDelft/Capybara
tags:
- code
---

# BinT5 

- **Repository: https://github.com/AISE-TUDelft/Capybara-BinT5**
- **Paper: https://huggingface.co/papers/2301.01701**
- **Point of Contact: https://huggingface.co/aalkaswan**
- **Raw Data: https://zenodo.org/records/7229913**

BinT5 is a Binary Code Summarization model, the base models are [CodeT5]() and fine-tuned with [Capybara](). 

We offer 5 variations of the model:

| Name                                                | Training Data                                        |
|-----------------------------------------------------|------------------------------------------------------|
| [BinT5-C](https://huggingface.co/AISE-TUDelft/BinT5-C)         | C Source                                             |
| [BinT5-Decom](https://huggingface.co/AISE-TUDelft/BinT5-Decom)     | Decompiled C Binaries                                |
| [BinT5-Stripped](https://huggingface.co/AISE-TUDelft/BinT5-Stripped)  | Stripped Decompiled C Binaries                       |
| [BinT5-Demi](https://huggingface.co/AISE-TUDelft/BinT5-Demi)      | Demi-stripped Decompiled C Binaries                  |
| [BinT5-NoFunName](https://huggingface.co/AISE-TUDelft/BinT5-NoFunName) | Decompiled C Binaries with the Function Name removed |

### Citation Information

```
@inproceedings{alkaswan2023extending,
  title={Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries},
  author={Al-Kaswan, Ali and Ahmed, Toufique and Izadi, Maliheh and Sawant, Anand Ashok and Devanbu, Premkumar and van Deursen, Arie},
  booktitle={2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={260--271},
  year={2023},
  organization={IEEE}
}
```