Failure 2B Base

A quick failed experiment at creating a SLM that can code. Based on Danube.

Scored 14.8% on HumanEval (FWIW I personally recommend using a quantized 7B model for coding instead of a SLM). Open-sourcing for transparency.

Evaluation

I evaluated it on HumanEval, it (pretty much) failed:

pass@1: 14.8%
pass@10: 26.2%

For context, this is slightly above the abilities of Llama 2 7B.

Details

Training dataset: ~200K high-quailty code conversations, mostly English
Prompt format: ChatML
Training epochs: 4
Training type: Full tune
Learning rate: 0.0002
Optimizer: AdamW
Context length: 4096

Training Dataset

Mostly code, but I threw in a couple thousand non-code-related conversations to help enhance it's conversational abilities. Probably still won't be good for conversations.

Approximately 200K conversations.

You might get strange results from normal conversations. But this model is meant for code, not conversations.

Training Loss

Show training loss

Training Loss	Epoch	Step	Validation Loss
1.1211	0.0	1	1.3592
1.2619	0.25	1387	1.3490
1.0502	0.5	2774	1.2202
1.1113	0.75	4161	1.1432
0.8066	1.0	5548	1.0925
0.7926	1.23	6935	1.0899
0.7116	1.48	8322	1.0569
0.6674	1.73	9709	1.0283
0.7022	1.98	11096	1.0060
0.4038	2.22	12483	1.1227
0.3946	2.47	13870	1.1011
0.4063	2.72	15257	1.1020
0.354	2.97	16644	1.1001
0.2783	3.2	18031	1.2364
0.3073	3.45	19418	1.2380
0.2984	3.7	20805	1.2399
0.2744	3.95	22192	1.2399

License

Feel free to use it under the Apache 2.0 license, it allows for both commercial and non-commercial use with few restrictions.

DISCLAIMER: This model may generate offensive or inaccurate content, I am not liable for any outputs from this model. Guardrails have not been added.