File size: 2,726 Bytes
52615a9 7f55b90 fdb96c8 7f55b90 52615a9 7f55b90 09e3de1 7f55b90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
language:
- en
license: apache-2.0
tags:
- solidity
- web3
- code generation
widget:
- text: "pragma solidity ^0.5.7;\n// Context: ParentA | Functions: helloA helloB | Constants: constantA \ncontract HelloWorld is ParentA {"
---
# A code autocomplete T5 model for solidity
- Hello world example to use this model, notice the input `text` includes
- Header solidity version like `pragma solidity ^0.5.7`
- Ancestor class/library info, e.g. public functions and constants from `ParentA`
- Contract/Library/Interface declaration header, e.g. `HelloWorld` ended with `{`
```python
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("hululuzhu/solidity-t5")
model = T5ForConditionalGeneration.from_pretrained("hululuzhu/solidity-t5")
text = """pragma solidity ^0.5.7;
// Context: ParentA | Functions: helloA helloB | Constants: constantA
contract HelloWorld is ParentA {"""
input_ids = model.tokenizer(text, return_tensors="pt", truncation=True).input_ids.to('cuda')
# Need to tune beam/topk/topp params to get good outcome
generated_ids = model.model.generate(input_ids, max_length=256, num_beams=5, top_p=0.95, top_k=50)
print(model.tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```
- Base T5 code model: https://huggingface.co/Salesforce/codet5-large
- Source data: https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts
- Processing steps: Clean, contract-level segmentation sepration, split in and out
- After processing input sample
```
pragma solidity 0.5.7;
// Context: PauserRole | Functions: isPauser addPauser renouncePauser | Constants:
contract Pausable is PauserRole {
```
- After processing output sample (**notice indentation is bad, this is intentional to reduce token size**)
```
event Paused(address account);
event Unpaused(address account);
bool private _pausableActive;
bool private _paused;
constructor () internal {
_paused = false;
}
function paused() public view returns (bool) {
return _paused;
}
modifier whenNotPaused() {
require(!_paused);
_;
}
modifier whenPaused() {
require(_paused);
_;
}
function pause() public onlyPauser whenNotPaused whenPausableActive {
_paused = true;
emit Paused(msg.sender);
}
function unpause() public onlyPauser whenPaused whenPausableActive {
_paused = false;
emit Unpaused(msg.sender);
}
function _setPausableActive(bool _active) internal {
_pausableActive = _active;
}
modifier whenPausableActive() {
require(_pausableActive);
_;
}
}
```
- Source training code: To be added |