Text Generation
Transformers
PyTorch
code
gpt2
custom_code
Eval Results
text-generation-inference

Model Card for SantaFixer

This is a LLM for code that is focussed on generating bug fixes using infilling.

Model Details

Model Description

How to Get Started with the Model

Use the code below to get started with the model.

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "lambdasec/santafixer"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint,
              trust_remote_code=True).to(device)

input_text = "<fim-prefix>def print_hello_world():\n
              <fim-suffix>\n print('Hello world!')
              <fim-middle>"
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Training Details

  • GPU: Tesla P100
  • Time: ~5 hrs

Training Data

The model was fine-tuned on the CVE single line fixes dataset

Training Procedure

Supervised Fine Tuning (SFT)

Training Hyperparameters

  • optim: adafactor
  • gradient_accumulation_steps: 4
  • gradient_checkpointing: true
  • fp16: false

Evaluation

The model was tested with the GitHub top 1000 projects vulnerabilities dataset

Downloads last month
35
Inference Examples
Inference API (serverless) has been turned off for this model.

Datasets used to train lambdasec/santafixer

Space using lambdasec/santafixer 1

Evaluation results

  • single-line infilling pass@1 on HumanEval
    self-reported
    0.470
  • single-line infilling pass@10 on HumanEval
    self-reported
    0.740
  • pass@1 (Java) on GH Top 1000 Projects Vulnerabilities
    self-reported
    0.260
  • pass@10 (Java) on GH Top 1000 Projects Vulnerabilities
    self-reported
    0.480
  • pass@1 (Python) on GH Top 1000 Projects Vulnerabilities
    self-reported
    0.310
  • pass@10 (Python) on GH Top 1000 Projects Vulnerabilities
    self-reported
    0.560
  • pass@1 (JavaScript) on GH Top 1000 Projects Vulnerabilities
    self-reported
    0.360
  • pass@10 (JavaScript) on GH Top 1000 Projects Vulnerabilities
    self-reported
    0.620