Can I finetune it on any other language. What should be the dataset structure for the same.

#6
by smjain - opened

Can I finetune it on any other language. What should be the dataset structure for the same. Can you provide some pointers

You can find a script for fine-tuning SantaCoder here, it allows you to fine-tune on text datasets like other programming languages of The Stack, but there's no guarantee the model can pick up a new language it wasn't pre-trained on just by fine-tuning. You could try some other tasks like python to text translation with this dataset or try fine-tuning on code classification tasks from CodexGlue for example (some fine-tuning scripts are available here).

Sign up or log in to comment