Training starcoder on my own repositories?

#28
by alexioc - opened

Hi,
I'm wondering if make sense to fine tune StarCoder on my own codebase to try to obtain better and more contextual response from the model.

A question that I'd like to ask is for example: "Create a Python integration module between mySystem1 and mySystem2 that allow all customer entities to be synced between the two systems"
Where:

  • mySystem1 and mySystem2 are two custom application my team built and I own all the code bases
  • "customer entities" must be translated in variable names based on the above codebases by the LLM

The only way to reach this goal is to fine tune a model like StarCoder? if yes, how can I prepare my dataset to train it? if not, are there other ways to do it?

Cheers,
Alexio

@alexioc , have you explored creating the dataset artificially using the LLM itself? I am working on a similar task (fine-tuning starcoder) and looking into the PEFT/LORA/qLORA options. Curious if you got your fine-tuning working as you needed?

Sign up or log in to comment