bart-large-code-instructiongen

Use this text2text model to find out what LLM instructions might be able to generate an arbitary piece of code!

Check out a basic demo on Spaces
An example of how to use instructiongen models in a CLI script can be found here
You can find other models fine-tuned for instruction generation by searching for the instructiongen tag

about

This model is a fine-tuned version of facebook/bart-large on the pszemraj/fleece2instructions-codealpaca dataset. It achieves the following results on the evaluation set:

Loss: 0.9222
Rouge1: 62.0692
Rouge2: 36.1947
Rougel: 57.5128
Rougelsum: 58.6613
Gen Len: 31.0060

Intended uses & limitations

🚨 note: as the authors elected to release the original dataset under cc-by-nc, the license carries over to this model and cannot be used for commercial activity.

Intended use: Research on domain adaptation and/or other improvements to LLMs by extending instruction:text data pairs.

Training and evaluation data

Refer to the linked dataset card for pszemraj/fleece2instructions-codealpaca or the original dataset repo.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.0914	1.0	563	1.0303	60.288	34.1884	55.9293	57.0714	30.6267
0.8688	2.0	1126	0.9333	61.0409	34.9823	56.4887	57.6662	31.7255
0.6773	3.0	1689	0.9222	62.0692	36.1947	57.5128	58.6613	31.0060

pszemraj
/

bart-large-code-instructiongen