gpt2-medium-finetuned-contract-gen

Overview

gpt2-medium-finetuned-contract-gen is a model specialized in generating Solidity contract codes. Derived from the gpt2-medium model by Hugging Face, it's been meticulously trained on an extensive set of Solidity contracts and patterns, making it apt for assisting in drafting or suggesting contract structures.

Model Description

This model has been designed specifically for generating Solidity contracts. Being a derivative of the gpt2-medium model, it retains the broader capabilities of the parent model while demonstrating a keen proficiency in understanding and generating Solidity-centric texts.

Performance

The model reported a loss of 0.3127 on the evaluation set.

Intended Uses & Limitations

Intended Uses:

Assist developers by auto-generating contract code snippets based on prompts.
Help in understanding and drafting complex contract structures.

Limitations:

The generated code must be reviewed for security and functional correctness.
The clarity of the generated code largely depends on the specificity of the prompt.

Training Details

Dataset

The model was fine-tuned on an undisclosed dataset comprised of a range of Solidity contracts.

Training Hyperparameters:

Learning Rate: 5e-05
Train Batch Size: 4
Evaluation Batch Size: 4
Seed: 42
Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
Learning Rate Scheduler: Cosine with restarts
Warmup Steps: 241
Epochs: 4

Training Results:

Training Loss	Epoch	Step	Validation Loss
0.4744	0.21	1000	0.4736
0.467	0.41	2000	0.4146
0.4089	0.62	3000	0.3852
0.4018	0.83	4000	0.3688
0.3475	1.04	5000	0.3523
0.2751	1.24	6000	0.3434
0.2966	1.45	7000	0.3334
0.292	1.66	8000	0.3230
0.2899	1.87	9000	0.3200
0.2508	2.07	10000	0.3164
0.28	2.28	11000	0.3127

Dependencies:

Transformers: 4.31.0
Pytorch: 2.0.1+cu118
Datasets: 2.14.2
Tokenizers: 0.13.3

How to Use

If you wish to use this model to generate Solidity contract code, follow the steps below:

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ckandemir/gpt2-medium-finetuned-contract-gen")
model = AutoModelForCausalLM.from_pretrained("ckandemir/gpt2-medium-finetuned-contract-gen")

# Input your code prompt
input_text = "contract MyToken"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
sample_output = model.generate(input_ids, do_sample=True, max_length=400, num_return_sequences=1, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
print(generated_text)

ckandemir
/

gpt2-medium-finetuned-contract-gen