Is the 14 programming Laungugae dataset uploaded on hugging face ? Any other option to doenload the data

#201
by MukeshSharma - opened

I am looking for programming launguage dataset which is used in the model to fine tune it . Where can i get it ?

They are worlds off of "code-davinci-003", now surpassed by "gpt3.5-turbo" with better results at 1/3rd the price, but these are the best models I found:

And I would just search for GitHub in the Datasets to use for fine tuning. For example, "codeparrot" has a few good ones.
Filter down to the language you want to fine tune on for better results:

I'm looking to do about the same. Best of luck!

Sign up or log in to comment