Edit model card

Failure 2B Base

Live Demo

A quick failed experiment at creating a SLM that can code. Based on Danube.

Scored 14.8% on HumanEval (FWIW I personally recommend using a quantized 7B model for coding instead of a SLM). Open-sourcing for transparency.

Evaluation

I evaluated it on HumanEval, it (pretty much) failed:

  • pass@1: 14.8%
  • pass@10: 26.2%

For context, this is slightly above the abilities of Llama 2 7B.

Details

  • Training dataset: ~200K high-quailty code conversations, mostly English
  • Prompt format: ChatML
  • Training epochs: 4
  • Training type: Full tune
  • Learning rate: 0.0002
  • Optimizer: AdamW
  • Context length: 4096

Training Dataset

Mostly code, but I threw in a couple thousand non-code-related conversations to help enhance it's conversational abilities. Probably still won't be good for conversations.

Approximately 200K conversations.

You might get strange results from normal conversations. But this model is meant for code, not conversations.

Training Loss

Show training loss
Training Loss Epoch Step Validation Loss
1.1211 0.0 1 1.3592
1.2619 0.25 1387 1.3490
1.0502 0.5 2774 1.2202
1.1113 0.75 4161 1.1432
0.8066 1.0 5548 1.0925
0.7926 1.23 6935 1.0899
0.7116 1.48 8322 1.0569
0.6674 1.73 9709 1.0283
0.7022 1.98 11096 1.0060
0.4038 2.22 12483 1.1227
0.3946 2.47 13870 1.1011
0.4063 2.72 15257 1.1020
0.354 2.97 16644 1.1001
0.2783 3.2 18031 1.2364
0.3073 3.45 19418 1.2380
0.2984 3.7 20805 1.2399
0.2744 3.95 22192 1.2399

License

Feel free to use it under the Apache 2.0 license, it allows for both commercial and non-commercial use with few restrictions.

DISCLAIMER: This model may generate offensive or inaccurate content, I am not liable for any outputs from this model. Guardrails have not been added.

Downloads last month
10

Space using mrfakename/failure-2b-base 1