Dataset

#1
by titan087 - opened

Hey,

What dataset did you use to finetune this model? I was looking for one to finetune codellama 34b and havent found one that looked good.

Thanks!

Same here. So I chose a benchmark dataset.
https://huggingface.co/datasets/codeparrot/xlcost-text-to-code

The JavaScript subsection has about 10K rows. I felt that to be good enough for a fine-tune. Let me know your thoughts as well.

Its worth a shot, for a basic test I can try training either Gemma or Llama3, or potentially Phi-3, at least to start with. If it works well enough, than scale it up to one of the coding based 34b models.

Sign up or log in to comment