TheBloke/Phind-CodeLlama-34B-v2-GPTQ

as a followup, I extended the context window using these settings:

{
"_from_model_config": true,
"bos_token_id": 1,
"do_sample": true,
"pad_token_id": 2,
"eos_token_id": 2,
"max_new_tokens": 384,
"temperature": 0.1,
"top_p": 0.75,
"top_k": 40,
"max_seq_length": 16384,
"rope_freq_base": 1000000,
"compress_pos_emb": 4,
"gpu_split": "19,23",
"transformers_version": "4.33.1"
}

Fumbling around a bit from various sources on the proper way to do the longer sequence with the CodeLlama model, but it was able to digest a 1000-line/35k character file and analyze the code in a way that wasn't crazy in text-generation-webui with these settings, so I spawned them in my server version (which is just a fastapi wrapper around AutoModelForCausalLM.from_pretrained basically)

and re-tested

{'pass@1': 0.7134146341463414}

again hard to say where the variance is, and my impression is that the compress_pos_emb setting needed to extend the context properly has a small negative effect, so this wasn't surprising but it still seemed competent.

TheBloke
/

Phind-CodeLlama-34B-v2-GPTQ

HumanEval Results