What coding dataset was used to train this model?

#1
by rombodawg - opened

What coding dataset was used to train this model?

Also if you are interested I have 2 datasets for code training if you wanted to make more models.

One more only that may lead to loss of logical function:
https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k

And one that is meant to be lossless and provide coding function:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored

Lets talk about it, i am interested.
I have used the dataset in my profile. 122k
I have checked your datasets you should convert them to llama2 format like mine. Convert them, add my dataset and create a new dataset from all, then i can fine tune it as soon as possible.

How did you create you 122k dataset? Was it created using gpt-4 prompting? Or was it sourced from somewhere on huggingface?

emre/llama-2-instruct-121k-code
I took it from another repo

Sign up or log in to comment