Post
1058
Replete-AI/code_bagel
Make the ultimate coding finetune to compete with the likes of closed source models using the code_bagel dataset!
Made by @rombodawg of RepleteAi, the code_bagel dataset contains over 800 million tokens of deduplicated and uncensored code from only reputable sources on huggingface. This code is formatted in the alpaca instruct format for ease of use in training.
Make the ultimate coding finetune to compete with the likes of closed source models using the code_bagel dataset!
Made by @rombodawg of RepleteAi, the code_bagel dataset contains over 800 million tokens of deduplicated and uncensored code from only reputable sources on huggingface. This code is formatted in the alpaca instruct format for ease of use in training.