Edit model card

CodeGen-350M-multi-xlcost-v2

CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset using Deepspeed.

Usage

You can load the CodeGen-350M-multi-xlcost-v2 model and tokenizer directly in transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2")
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2")

text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n"
input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

Output:

'''
function to add two numbers 
'''
###
def add(a, b):
    return a + b

Training

The model was finetuned on XLCost-single-prompt, an improved version of the original XLCost dataset xlcost-text-to-code. Below the hyperparameters.

Hyperparameter value
Per device train batch size 16
Context size 1024
Training steps 259
Gradient accumulation 2
Gradient checkpointing True
Learning rate 1.8e-05
Weight decay 0.1
Warmup steps 35
Schedule linear
zero stage 2

Below the deepspeed configuration

{
  "fp16": {
    "enabled": true,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": 0.000018,
      "betas": [
        0.9,
        0.999
      ],
      "eps": 1e-8,
      "weight_decay": 0.1
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": 0,
      "warmup_max_lr": 0.000018,
      "warmup_num_steps": 35
    }
  },
  "zero_optimization": {
    "stage": 2,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": false
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 200000000,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 200000000,
    "contiguous_gradients": true
  },
  "gradient_accumulation_steps": 2,
  "train_batch_size": 32,
  "train_micro_batch_size_per_gpu": 16,
  "gradient_clipping": 1,
  "wall_clock_breakdown": false
}

The training was executed on 1 x V100 (16GB) GPU for 28min 50sec

Performance

We evaluated the model on the first 400 samples of XLCOST's XLCost-single-prompt test split and comparing the outputs of the generated codes with respect to the expected output using pass@k metric.

Metric codegen-350M-multi-xlcost-v2 codegen-350M-multi-xlcost codegen-350M-mono(zero-shot) codegen-350M-mono (one-shot) codegen-350M-mono(few-shot)
pass@1 3.325% 3.70% 0.4% 0.35% 0.48%
pass@10 15% 14.5% 3.5% 3 % 3.75%
CodeBLEU 20.18% None 15.15% 19.42 % 20.27%

The pass@k metric tells the probability that at least one out of k generations passes the tests.

Citations

@article{Nijkamp2022ACP,
  title={A Conversational Paradigm for Program Synthesis},
  author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint},
  year={2022}
}
Downloads last month
4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train giulio98/codegen-350M-multi-xlcost-v2

Evaluation results